A mathematical theory for understanding when abstract representations emerge in neural networks

Englishto

Unlocking the Mathematics Behind Abstract Thinking in Neural Networks. Imagine a neural network learning to recognize the parity and magnitude of handwritten digits—whether a number is odd or even, small or large. In both biological brains and artificial networks, something remarkable often happens: these different aspects of a task become encoded along separate, nearly orthogonal directions in the activity of neurons. This clean separation is known as an abstract or disentangled representation, a geometry that allows systems to generalize to new situations, even those outside of their previous experience. But why do these abstract representations appear so consistently, and how do they emerge from the learning process? This question has puzzled neuroscientists and machine learning experts alike, especially since most previous theories focused on unsupervised learning methods. New mathematical insights now reveal a powerful and general answer. At the heart of this breakthrough is a rigorous theory showing that when a feedforward neural network is trained on tasks that directly depend on hidden, or latent, variables—like those parity and magnitude labels—abstract representations are not just possible, but inevitable. Specifically, in the final hidden layer of a nonlinear network, the representations of these latent factors naturally align with distinct, independent axes. This happens regardless of the specifics of the activation function or the depth of the network, as long as the task structure depends on those latent variables. To reach this conclusion, researchers developed a sophisticated analytical framework. Instead of focusing on the millions of parameters in a network, they zoomed in on the patterns of neuron activity—the so-called “neural preactivations”—across all inputs. By translating the complex optimization over weights into a more manageable mean-field problem over these neural patterns, they unlocked a precise mathematical landscape. Here, the geometry of the data and the structure of the task labels shape the optimal way for neurons to encode information. A central tool in this analysis is the “parallelism score,” a measure of how well each task-relevant variable is represented independently of the others. When the parallelism score approaches one, it signals a perfectly abstract representation: changing one variable, like parity, shifts neural activity in a consistent direction, no matter the value of others like magnitude. If the score is near zero, the variables are hopelessly entangled. Through this lens, the emergence of abstraction is no accident. It is a direct consequence of how learning shapes the network to mirror the structure of the task. Even more compelling, the theory applies to a wide range of nonlinearities and architectures, capturing both shallow and deep networks. This mathematical theory not only explains the prevalence of abstraction in trained networks and real brains, but also provides a powerful toolkit for predicting and analyzing how different tasks, data structures, and network designs give rise to specific kinds of representations. It bridges the worlds of neuroscience and artificial intelligence, offering a deep, unifying understanding of how abstract thinking emerges from the raw activity of neurons—whether silicon or biological.

0shared

A mathematical theory for understanding when abstract representations emerge in neural networks

I'll take...