Continuous Valued Neural Networks With Two Hidden Layers Are Sufficient

1 Introduction

One of the first results in the development of neural networks are the universal approximation theorems [1, 2]

. These classical results show that a multilayer feedforward network with only one hidden layer and non-polynomial activation function (like the sigmoid) can approximate every continuous function over a compact set on

. It is well-known that these results have two important drawbacks for their practical use: On the one hand, the width of the hidden layer grows exponentially and, on the other hand, the proof is not constructive, and it does not provide an algorithm for building such network.

Bearing these results in mind, many researchers are paying attention to theoretical aspects of the current success of new neural network architectures and asking for bounds for the depth and width of such networks and the possibility of acting as universal approximators (see, e.g., [3, 4, 5] among many others).

It is out of any doubt that the use of many hidden layers is a big contribution to the success of deep learning architectures

[6], but instead of exploring the power of depth, recently several studies have made interesting contributions on the power of width [7, 8, 9]. To sum up, the authors show that there exists continuous functions on compact sets which cannot be approximated by a neural network if the width of the layers is not larger than a bound, regardless the depth of the network.

In this paper, we provide an effective method for finding the weights of a multilayer feedforward network with two hidden layers which approximates any continuous mappings between two triangulable sets on metric spaces. Let us remark that the method is constructive and it only depends on the desired level of approximation to the original continuous mapping.

Our approach is based on a strong result from computational topology. Roughly speaking, our result is rooted in two observations: On the one hand, triangulable sets can be modelled by simplicial complexes, and a continuous mapping between two triangulable spaces can be approximated by a mapping between simplicial complexes. On the second hand, a mapping between simplicial complexes can be modelled as a two-hidden-layer feedforward network.

Let us remark that the classical result is valid for all compact set on and our theorem is valid for triangulable sets of , which is a smaller collection of sets. Nevertheless, as it will be detailed below, compact and triangulable sets differ in a quite technical topological property, and therefore, our result have a practical application in many current problems.

The paper is organized as follows: in Section 2 the preliminary notions about multilayer feedforward networks and simplicial complexes are provided. Then, in Section 3, a concrete architecture that acts equivalently to a given simplicial map is given. In Sections 4 and 5, we extend the already known simplicial approximation theorem and universal approximation theorem, respectively. Finally, conclusions are supplied in 6.

2 Background

In this section, some notions about artificial neural networks and simplicial complexes are recalled.

2.1 Multilayer feedforward networks

Artificial neural networks are inspired in biological networks of alive neurons in a brain. The number of different architectures, algorithms, and areas of application has recently grown in many directions. In general, a neural network can be formalized as a mapping

which depends on a set of weights and a set of parameters

which involves the description of activation functions, layers, synapses between neurons, and whatever other consideration in its architecture. A good introduction to artificial neural networks can be found in

[10].

In this paper, we focus on one of the simplest classes of artificial neural networks: multilayer feedforward networks which consist of three or more fully connected layers of nodes (neurons): an input layer, an output layer, and one or more hidden layers. Each node in one layer has an activation function and it is connected with every node in the following layer. The next definition formalizes this idea.

Definition 1 .

A multilayer feedforward network defined on a real-valued -dimensional space is a mapping such that, for each , is computed as the composition of functions

where is the number of hidden layers, , , , is defined as being a real-valued matrix, the bias term, and a bounded, continuous, and non-constant function (called activation function). Notice that , , , , is called the width of the -th hidden layer and is the width of the multilayer feedforward network.

Next, we re-write one of the most important theoretical results of multilayer feedforward networks adapted to our notation.

Theorem 1 (Universal Approximation Theorem, [2]).

Let be any compact subset of . The space of real-valued continuous functions on is denoted by . Then, given any and any function , there exists a one-hidden-layer feedforward network defined as with and , such that is an approximation of the function , that is,

for all .

The known proofs given to this theorem are non constructive (see [11, 2, 1]), i.e., given and , the theorem claims that a neural network exists fulfilling the conditions, but there is no an algorithm to build it in the general case. In this paper, we will provide a constructive approach to this theorem but using a multilayer feedforward network with two hidden layers, instead of one. The most important restriction is that our result is valid for triangulable sets, instead of compact ones, but, as pointed out above, it covers most of the real-world problems.

2.2 Simplicial Complexes

The main result in this paper is based on a strong result in computational topology known as the Simplicial Approximation Theorem. Next, we recall such theorem and some basics on the theory of simplicial complexes. We have followed the notation from [12, 13].

Simplicial complexes are a data structure widely used to represent topological spaces. They are a versatile mathematical objects to represent a decomposition of a topological space into simple pieces. For a further comprehension of the field [14, 15, 12, 16, 13] can be consulted. Next, we recall the formal definition of the small pieces, called simplices, that are used to construct our main data structure: simplicial complexes.

Definition 2 .

Let be affinely independent points in with . An -simplex, , is the convex set

The points are the vertices of . Let us denote by the dimension of which in this case is . We say that is a face of if is an -simplex, with , and whose vertices are also vertices of .

When several simplices are joint together, a more complex structure called simplicial complex is built. The following definition exhibits the way simplices can be glued together to obtain such a simplicial complex.

Definition 3 .

A simplicial complex is a finite collection of simplices such that:

  1. and implies ;

  2. , implies is either empty or a face of both.

The underlying space of , denoted by is the union of its simplices together with the topology inherited from the ambient Euclidean space where the simplices are placed. A subcomplex of is a simplicial complex such that . The subcomplex of consisting of all simplices of of dimension or less is called the -skeleton of and it is denoted by . Let us remark that the 0-skeleton of is its vertex set.

Definition 4 .

Let be a simplicial complex. A simplex is maximal if it is not a face of any other simplex in . A simplex is adjacent to if and share a face in .

Next, we recall the idea of star of a simplex in a simplicial complex . Intuitively, it is the set of simplices in which have as a face.

Definition 5 .

Let be a simplicial complex and a face of . The star of in , denoted by , is defined as:

Simplicial complexes are combinatorial data structures used to "model" topological spaces. A way to obtain a refined model from an existing one is to subdivide it in small pieces such that the result is topologically equivalent to the former.

Definition 6 .

Let and be simplicial complexes. It is said that is a subdivision of if:

  1. ;

  2. implies that there exists such that .

The barycentric subdivision is a concrete example of subdivision of simplicial complexes. Let us recall that the barycenter of an -simplex is

By using this definition, the idea of barycentric subdivision of a simplicial complex arises in a natural way.

Definition 7 .

Let be a simplicial complex. The barycentric subdivision of the -skeleton of is defined as the set of vertices of , that is, . Assuming we have , which denotes the barycentric subdivision of the -skeleton of , is built by adding the barycenter of every -simplex as a new vertex and connecting it to the simplices that subdivide the boundary of such -simplex.The iterated application of barycentric subdivisions is denoted by where is the number of iterations.

Figure 1: Example of simplicial complex. is a -simplex, is a -simplex , is a -simplex, and is a -simplex. is a face of the simplex . The simplices , , , , , , , and are the maximal simplices of the simplicial complex.
Figure 2: On the left the -simplex is shown, and on the right its first barycentric subdivision is plotted.

Let us see now how the "size" of the simplices of a simplicial complex can be measured.

Definition 8 .

Let be a simplicial complex. The diameter of a simplex in is defined as

and the mesh of is defined as

Theorem 2 ([13, p. 86]).

Given a simplicial complex , and , there exists an integer such that .

Next, we recall one of the main tools used in this paper. A simplicial complex can be seen as mathematical model of a region of an -dimensional space. By using subdivisions, more and more tuned simplicial complexes can be obtained. Let us now think about maps between simplicial complexes. These maps can be considered as extensions of simpler maps defined from vertices to vertices of two simplicial complexes. Interestingly, such maps can be considered as approximations of continuous functions defined on the underlying topological spaces that the simplicial complexes are modeling. Let us formalize these notions.

Definition 9 .

Let and be two simplicial complexes. A vertex map is a function with the property that the vertices of every simplex in map to vertices of a simplex in .

A vertex map can be extended to a continuous map in the following way.

Definition 10 .

Let and be two simplicial complexes and let be a vertex map. The simplicial map induced by is defined as follows. Let . Then there exists an -simplex in and numbers such that

Then

Any vertex map induces a simplicial map , but, if we want that such map is an approximation between the underlying spaces of both simplicial complexes, a restriction about the star of each simplex needs to be added.

Definition 11 .

Let and be simplicial complexes and let be a continuous map. A simplicial map induced by a vertex map is a simplicial approximation if

for each vertex of .

Furthermore, there exists strong results on simplicial complexes. Concretely, we focus our attention in the Simplicial Approximation Theorem which ensures the existence of simplicial maps that approximate continuous maps arbitrarily closely.

Theorem 3 (Simplicial Approximation Theorem [12, p. 56]).

If is continuous, then there is a sufficiently large integer such that is a simplicial approximation of .

Theorems 1 and 3 are key results in their respective fields. Our aim is to take Theorem 3 as a pillar for obtaining a result close to Theorem 1. Roughly speaking, the idea is to consider a simplicial approximation between two simplicial complexes. On the one hand, such simplicial approach is characterized via a vertex map which can be expressed as a neural network. On the other hand, the simplicial approximation can be chosen in such way that it approximates the continuous map between the underlying spaces of the simplicial complexes. Since the parameters of the neural network can be effectively obtained from the vertex map, this method provides a constructive way to find a neural network which approximates a continuous function on the underlying spaces of two simplicial complexes.

Following that aim, let us formalize the relationship between topological spaces and simplicial complexes.

Definition 12 .

A triangulation of a topological space consists of a simplicial complex and a homeomorphism .

The spaces that can be triangulated by simplicial complexes (see [16, Theorem A.7, p. 525]) are every compact, locally contractible that can be embedded in for some . Let us remark that the Universal Approximation Theorem (Theorem 1) is valid for any compact subset of , regardless if they are locally contractible or not. Not all compact sets in a metric space are locally contractible (see [17, Chapter 17.7, p. 426]

). Nevertheless, non-locally contractible spaces are very odd in

and this technical topological property has no practical application in real-world problems. Therefore, we can say that the results proved on triangulable spaces are true on a large amount of current neural network problems.

Next, we can extend the definition of a mesh of a simplicial complex in the following way.

Definition 13 .

Let be a triangulable topological space and a triangulation of . The mesh of induced by is defined as

where such that is the extended diameter of a simplex.

3 Multilayer Feedforward Networks and Simplicial Maps

In this section, we will show that such simplicial maps can be modelled via multilayer feedforward networks in a straightforward way.

In the following theorem, we will compute a two-hidden-layer feedforward network to a simplicial map where all the simplices of and are maximal simplices with maximal dimension. This is not an important constraint in our case, since our final aim is to design a multilayer feedforward network that approximates a continuous function between triangulable spaces.

Theorem 4 .

Given a simplicial map where and are composed by maximal simplices with maximal dimension. A two-hidden layer feedforward network such that for all can be explicitly defined.

Proof.

Let us assume and . Besides, let us consider that is composed by maximal -simplices where for all , and by maximal -simplices where for all . Let us consider a multilayer feedforward network with the following architecture: (1) An input layer composed by neurons; (2) a first hidden layer composed by neurons where ; (3) a second hidden layer composed by neurons; and (4) an output layer with neurons. Then, being , . The idea is to codify the simplicial complexes involved in the mapping in the hidden layers of the multilayer feedforward network. Firstly, a point in is transformed into a vector. This vector can be seen as the juxtaposition of vectors of dimension , one for each of the simplices in . Each vector of dimension represents the barycentric coordinates of with respect to the corresponding simplex. The matrix of the weights to obtain the output of the first hidden layer, , and the corresponding bias term can be obtained from the equations for getting the barycentric coordinates as follows:

The matrix of weights between the first and the second hidden layer encodes the vertex map between vertices. Since is encoded in the first hidden layer with neurons and is represented via neurons, the matrix is composed by 0's and 1's. The value 1 represents that the corresponding vertices are related via the vertex map, and 0 represents that they do not. Then, this matrix, , can be defined as follows:

being and for ; ; ; and . Finally,

and .

The output of the second hidden layer can be seen as the juxtaposition of vectors of dimension , one vector for each simplex in the simplicial complex . Each of the vectors represents the barycentric coordinates of with respect to the corresponding simplex. For each simplex in , the corresponding barycentric coordinates are a vector of components whose sum is 1, but all coordinates are greater than or equal to zero if belongs to that simplex. Only those coordinates are considered, and, in the next step, transforms this vector and the outputs the Cartesian coordinates of . Then:

being ; .
And, finally, the following functions are defined. For the first two layers:

In the case of the third layer, is defined as follows:

with , and where and if all the coordinates of are greater than or equal to , and otherwise.

This proposition establishes that two-hidden-layer feedforward network can act equivalently to simplicial maps. Besides, the architecture and the specific computation of the parameters is shown.

4 Simplicial Approximation Theorem extension

In this section, we provide an extension of the simplicial approximation theorem, and an explicit algorithm to compute a simplicial approximation as close as we want to a given continuous function between the underlying spaces of, respectively, two simplicial complexes and . The first observation is that the Simplicial Approximation Theorem (Theorem 3) refers to any continuous mapping. Continuity is a property of functions in topological spaces, not necessarily metric spaces. The next result introduces the concept of metric into the simplicial approximations.

Proposition 1 .

Given and a continuous function between the underlying spaces of two simplicial complexes and , there exists such that is a simplicial approximation of and .

Proof.

By Theorem 2, there exists such that . Then, by Theorem 3, there exist such that is a simplicial approximation of :

Besides, because .

algocf[htbp]

Algorithm LABEL:algo-aprox is inspired in the proof of the Simplicial Approximation Theorem in [12, p.56] and computes a vertex map from which the simplicial approximation of a given function between the underlying spaces of, respectively, two simplicial complexes and can be defined.

Theorem 5 .

Given a continuous function and , a two-hidden-layer feedforward network such that can be explicitly defined.

Proof.

By proposition 1, there exists a simplicial approximation of such that . Then, by Theorem 4 there exists such that .

5 Universal Approximation Theorem Extension

The results of the previous section can be extended to triangulable spaces. In this case, the approach is constructive if the homeomorphisms of the triangulations are known. In the previous sections, we have proved that a continuous function between triangulable sets can be approximated by using the simplicial approximating theorem. In this section, using the new version of the simplicial approximation theorem (Proposition 3) we can give a constructive version of the Universal Approximation Theorem that approximates any continuous function (under some conditions) arbitrarily close.

Proposition 2 .

Let be a triangulation of a space . For all there exists and such that if then .

Proof.

Consider . If and belong to . Then, and . Otherwise, we repeat the reasoning with . Now, and can belong to the same simplex in or not. If they belong to the same simplex, write . If not, take a new point such that and . Therefore, . Besides, . This can be iterated: and . By this, we have defined a sequence that converges to . Therefore, given , there exists such that . Let us suppose, without loss of generality, that . Then, we can consider .

Corollary 1 .

Given and a triangulation of , there exists such that .

Proof.

By Theorem 2 there exists such that . Then, by Proposition 2, there exists such that .

Finally, we reach the main result of this section. Given a continuous map between two triangulable spaces and , we can obtain two simplicial complexes and associated to them, and a simplicial approximation between them which approximates .

Proposition 3 .

Let and be two triangulable topological spaces, a continuous map, and . Then, there exists two triangulations and of and , respectively, and a simplicial approximation such that .

Proof.

By Corollary 1, there exists such that
. Next, by Theorem 3, there exists and such that is a simplicial approximation of . Taking into account that and , we can define the simplicial map induced by . Finally, we can define , , and . Since then

Theorem 6 .

Given a continuous function and two triangulations and of and , respectively. A two-hidden-layer feedforward network that approximates arbitrarily close can be explicitly defined.

Proof.

By Proposition 3, there exists a simplicial approximation of that approximates arbitrarily close. Besides, by Theorem 4, there exists such that in all the domain. Therefore, approximates arbitrarily close.

6 Conclusions

In this paper, we provide an effective method for building a multilayer feedforward network which approximates a continuous function between triangulable spaces. The main contribution of the paper is that the weights can be exactly and effectively computed without any training process. On the disadvantages, two can be considered from a technical point of view: Firstly, the homeomorfisms between the triangulable spaces and the simplicial complexes can be hard to find, and secondly, the classic theorem for approximation of neural networks is valid for compact sets, and our result is only valid for triangulable sets. Nonetheless, as pointed above, most of the real-world problems are covered by our result and, therefore the approximation to the continuous function via neural network can be effectively built.

References

  • [1] George Cybenko.

    Approximation by superpositions of a sigmoidal function.

    Mathematics of Control Signal and Systems, 2(4):303–314, 1989.
  • [2] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, March 1991.
  • [3] Shiyu Liang and R. Srikant. Why deep neural networks? CoRR, abs/1610.04161, 2016.
  • [4] Itay Safran and Ohad Shamir. Depth-width tradeoffs in approximating natural functions with neural networks. In Doina Precup and Yee Whye Teh, editors,

    Proceedings of the 34th International Conference on Machine Learning

    , volume 70 of Proceedings of Machine Learning Research, pages 2979–2987, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
  • [5] Matus Telgarsky. Benefits of depth in neural networks. CoRR, abs/1602.04485, 2016.
  • [6] Shizhao Sun, Wei Chen, Liwei Wang, Xiaoguang Liu, and Tie-Yan Liu. On the depth of deep neural networks: A theoretical view. In Dale Schuurmans and Michael P. Wellman, editors, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA., pages 2066–2072. AAAI Press, 2016.
  • [7] Boris Hanin and Mark Sellke.

    Approximating continuous functions by relu nets of minimal width, 2017.

  • [8] Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 6232–6240, 2017.
  • [9] Quynh Nguyen, Mahesh Chandra Mukkamala, and Matthias Hein. Neural networks should be wide enough to learn disconnected decision regions. CoRR, abs/1803.00094, 2018.
  • [10] Simon Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall, 1999.
  • [11] K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2:356–366, 1989.
  • [12] Herbert Edelsbrunner and John Harer. Computational Topology - an Introduction. American Mathematical Society, 2010.
  • [13] James R. Munkres. Elements of algebraic topology. Addison-Wesley, 1984.
  • [14] R. Ayala, E. Domínguez, and A. Quintero. Elementos de la teoría de homología clásica. Ciencias (Universidad de Sevilla). Secretariado de Publicaciones, Universidad de Sevilla, 2002.
  • [15] J.D. Boissonnat, F. Chazal, and M. Yvinec. Geometric and Topological Inference. Cambridge Texts in Applied Mathematics. Cambridge University Press, 2018.
  • [16] Allen Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002.
  • [17] R. Geoghegan. Topological Methods in Group Theory. Graduate Texts in Mathematics. Springer New York, 2010.

pullumofectown.blogspot.com

Source: https://deepai.org/publication/two-hidden-layer-feedforward-neural-networks-are-universal-approximators-a-constructive-approach

0 Response to "Continuous Valued Neural Networks With Two Hidden Layers Are Sufficient"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel