Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Map
(2024)
An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem
Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Curran Associates 37 (2024)
Abstract:
Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.Exploiting the equivalence between quantum neural networks and perceptrons
(2024)
Exploring Simplicity Bias in 1D Dynamical Systems
Entropy MDPI 26:5 (2024) 426
Abstract:
Arguments inspired by algorithmic information theory predict an inverse relation between the probability and complexity of output patterns in a wide range of input-output maps. This phenomenon is known as simplicity bias. By viewing the parameters of dynamical systems as inputs, and the resulting (digitised) trajectories as outputs, we study simplicity bias in the logistic map, Gauss map, sine map, Bernoulli map, and tent map. We find that the logistic map, Gauss map, and sine map all exhibit simplicity bias upon sampling of map initial values and parameter values, but the Bernoulli map and tent map do not. The simplicity bias upper bound on the output pattern probability is used to make a priori predictions regarding the probability of output patterns. In some cases, the predictions are surprisingly accurate, given that almost no details of the underlying dynamical systems are assumed. More generally, we argue that studying probability-complexity relationships may be a useful tool when studying patterns in dynamical systems.Non-Poissonian Bursts in the Arrival of Phenotypic Variation Can Strongly Affect the Dynamics of Adaptation
Molecular Biology and Evolution 91探花 University Press 41:6 (2024) msae085