Overparametrizing quantum neural networks

"Theory of overparametrization in quantum neural networks"

Published by M.Larocca, N. Ju, D. García-Martín, P. J. Coles, M. Cerezo (LANL et al), 25th September 2021

Machine learning
Overparametrizing quantum neural networks

Quantum Machine Learning (QML) models often consist of a data embedding followed by a parametrized quantum circuit often called Quantum Neural Network (QNN). So far, QNNs have played a significant role in quantum machine learning applications in function approximation, handling big data, modelling social networks, associative memory devices, and automated control systems. However, the training of such NNs requires optimizing the QNN’s parameters, which is a non-deterministic polynomial time problem.

The authors in this work employ the technique of over-parameterization, which is the regime where one trains a NN with a larger capacity than required to represent the distribution of the training data. Therefore, the QNN has more parameters to explore all relevant directions in the Hilbert space. Over-parametrizing a NN can improve its performance by improving the trainability and reducing generalization errors, leading to faster convergence. In this work, the authors provide a theoretical framework for the over-parametrization of QNNs and describe its correspondence to a computational phase transition, where the trainability of QNNs is vastly improved due to a landscape with less local minima. The work proposes three theorems to analyze the over-parameterization phenomenon followed by numerical verifications.

The first theorem derives an upper bound as a sufficient condition for over-parametrization. This indicates that for a general type of periodic-structured QNNs, an over-parameterized regime can be reached by increasing the number of parameters past some threshold critical value. This value is related to the dimension of the Dynamical Lie Algebra (DLA) associated with the generators of the QNN. The second theorem provides an operational meaning to the over-parameterization definition in terms of the model’s capacity i.e. increasing the number of parameters can never increase the model capacity beyond the upper bound derived in the first theorem. Lastly, the third theorem shows that the maximum number of relevant directions around the global minima of the optimization problem is always smaller than the upper bound, hence imposing a maximal rank on the Hessian when evaluated at the solution.

For numerical demonstration, three different optimization tasks were considered: i) finding the groundstate energy with a Variational Quantum Eigensolver (VQE), ii) decomposing a unitary to quantum gates through unitary compilation, and iii) compressing information with quantum autoencoding. For VQE, the problem sizes of n = 4, 6, 8, 10 qubits were considered for ansatzes with different depths L with both open and closed boundary conditions. After averaging over 50 random parameter initializations, it is observed that the loss function decreases exponentially with each optimization step when the number of layers is large enough. Similarly for unitary compilation, the numerical results suggest that as the depth of the circuit increases, the convergence towards the global optimum improves dramatically until reaching a saturation point. Lastly, the results for quantum autoencoding were obtained for a system of n = 4 qubits and for the same hardware efficient ansatz used for unitary compilation. Here, it was again observed that an over-parameterized QNN is able to accurately reach the global optima in a few iterations. The value of the obtained rank in contrast to the dimension of the DLA suggests that over-parameterization can be achieved with a number of parameters that is much smaller than the upper-bound.

It is noteworthy to mention a crucial difference between the results obtained for the Hamiltonian variational ansatz (theoretical) and for the hardware efficient ansatz (numerical). In the case of Hamiltonian variational ansatz, the dimension of the DLA scaled polynomially with n, whereas for the hardware efficient ansatz the dimension of the DLA grows exponentially with the system size. This implies that the system becomes over-parameterized at a polynomial number of parameters for the first case, while in the latter case one requires an exponentially large number of parameters. Since most ansatzes used for QNNs are ultimately hardware efficient ansatzes which exhibit barren plateaus, this may require an exponential number of parameters to be over-parameterized, which requires more effort in designing scalable and trainable ansatzes. Finally, a take-away message of this work is that QNN architectures with polynomially large DLAs will have more favorable loss landscapes due to the lack of barren plateaus.

One direction of future research would be to consider more complicated loss functions consisting of functions of different QNN contributions, such as those used in Quantum Circuit Learning for supervised learning or Differentiable Quantum Circuits for solving differential equations. It remains an open question whether such loss function would exhibit similar traits as those single-Hamiltonian minimizations considered here.