Cybenko, G. Approximation by superpositions of a sigmoid function. Mathematics of Control, Signals, and Systems, 2, 303-314, 1989.
 Kurt Hornik. Approximation capabilities of multilayer feedforward networks. 1991, Neural Networks.
 Hornik, K., Stinchcombe, M., and White, H. Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359-366, 1989.
 Hornik, K., Stinchcombe, M., and White, H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural networks, 3(5), 551-560, 1990.
 Leshno, M., Lin, V. Y., Pinkus, A., and Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6, 861-867, 1993.
 Barron, A. E. Universal approximation bounds for superpositions of a sigmoid function. IEEE Transactions on Information Theory, 39, 930-945, 1993.
 Montufar, G. Universal approximation depth and errors of narrow belief networks with discrete units. Neural Computation, 26, 2014.
 Raman Arora, Amitabh Basu, Poorya Mianjy, Anirbit Mukherjee. Understanding Deep Neural Networks with Rectified Linear Units. 2016, Electronic Colloquium on Computational Complexity.
 Nair, V. and Hinton. Rectified linear units improve restricted Boltzmann machines. In L. Bottou and M. Littman, editors, Proceedings of the Twenty-seventh International Conference on Machine Learning (ICML 2010).
 X. Glorot, Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, 2010.
 Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio. Noisy Activation Functions. ICML 2016