Experimental Quantum Reinforcement Learning

"Experimental quantum speed-up in reinforcement learning agents"

Published by V. Saggio, B. E. Asenbeck, A. Hamann, T. Strömberg, P. Schiansky, V. Dunjko, N. Friis, N. C. Harris, M. Hochberg, D. Englund, S. Wölk, H. J. Briegel & P. Walther (University of Vienna, University Innsbruck, Leiden University, IQOQI, MIT, DLR, Institut fur Quantentechnologien (Ulm), University Konstanz), 28th March 2021

Nature 591 (2021)
Photonic platforms
Experimental Quantum Reinforcement Learning

With the advent of more powerful classical computational power, machine learning and artificial intelligence research has made a recent resurgence in popularity and massive progress has been made in recent years in developing useful algorithms for practical applications. Meanwhile, quantum computing research has advanced to a stage where quantum supremacy has been shown experimentally, and theoretical algorithmic advantages in, for instance, machine learning have been theoretically proven. One particularly interesting machine learning paradigm is Reinforcement Learning (RL), where agents directly interact with an environment and learn by feedback exchanges. In recent years, RL has been utilized to assist in several problems in quantum information processing, such as decoding of errors, quantum feedback and adaptive code-design with significant success. Turned around, implementing ‘quantum’ RL using quantum computers has been shown to make the decision making process for RL agents quadratically faster than on classical hardware.

In most protocols so far, the interaction between the agent and the environment has been designed to occur entirely via classical communication in most RL applications. However, there is a theoretically suggested possibility of gaining increased quantum speedup, if this interaction can be transferred via quantum route. In this work, the authors propose a hybrid RL protocol that enables both quantum as well as classical information transfer between the agent and the environment. The main objective is to evaluate the comparative impact of this hybrid model on agent’s learning time with respect to RL schemes based on solely classical communication. The work uses a fully programmable nanophotonic processor interfaced with photons for the experimental implementation of the protocol. The setup implements an active feedback mechanism combining quantum amplitude amplification with a classical control mechanism that updates its learning policy.

The setup consists of a single-photon source pumped by laser light leading to the generation of a pair of single photons. One of these photons is sent to a quantum processor to perform a particular computation, while the other one is sent to a single-photon detector for heralding. Highly efficient detectors with short dead time response serve as fast feedback. Both detection events at the processor output and photon detector are recorded and registered with a time tagging module (TTM) as coincidence events. The agent and the environment are assigned different areas of the processor, performing the prior steps of the Grover-like amplitude amplification. The agent is further equipped with a classical control mechanism that updates its learning policy.

Any typical Grover-like algorithm faces a drop in the amplitude amplification after reaching the optimal point. Each agent reaches this optimal point at different epochs, therefore one can identify the probability up to which it is beneficial for all agents to use a quantum strategy over the classical strategy. The average number of interactions until the agent accomplishes a specific task is the learning time. The setup allows the agents to choose the most favorable strategy by switching from quantum to classical as soon as the second becomes more advantageous. Such combined strategy is shown to outperform the pure classical scenario.

Such a hybrid model represents a potentially interesting advantage over previously implemented protocols which are purely quantum or classical. Photonic architectures in particular are put forward by the authors to be one of the most suitable candidates for implementing these types of learning algorithms, by providing advantages of compactness, full tunability and low-loss communication which easily implements active feedback mechanisms for RL algorithms even over long distances. However, the theoretical implementation of such protocols is general and shown to be applicable to any quantum computational platform. Their results also demonstrate the feasibility of integrating quantum mechanical RL speed-ups in future complex quantum networks.

Finally, through the advancement of integrated optics towards the fabrication of increasingly large devices, such demonstration could be extended to more complex quantum circuits allowing for processing of high-dimensional states. This raises hopes for achieving superior performance in increasingly complex learning devices. Undoubtedly in the near future, AI and RL will play an important role in future large-scale quantum communication networks, including a potential quantum internet.