Benchmarking Quantum Computers: what the Q?

"Scalable Benchmarks for Gate-Based Quantum Computers"

Published by Arjan Cornelissen, Johannes Bausch, András Gilyén (University Amsterdam, University Cambridge, California Institute of Technology), 3rd May 2021

NISQ hardware
Benchmarking Quantum Computers: what the Q?

The physical realization of quantum computers has advanced to a stage when present day quantum processors feature NISQ devices with tens of qubits. Since these devices have different benefits and drawbacks, depending on the device quality and architecture, it's highly advantageous to do a comparative analysis evaluating their performance against defined benchmarks. To date, various structured tasks have been proposed in order to measure the performance of quantum computers. Typical examples include counting the physical qubits (building blocks of digital quantum circuits) implemented in the quantum system, measuring the efficiency in terms of resources (qubits, gates, time, etc.) of preparation of absolute maximally entangled states, volumetric and mirror randomized benchmarking.

One of the first popularized performance metrics proposed (introduced by IBM) is "quantum volume", which is a single-value metric for quantum devices that quantifies how well a quantum system is capable of executing a sizeable random circuit (with circuit depth equal to qubit grid size) with reasonable fidelity. It enables the comparison of hardware with widely different performance characteristics and quantifies the complexity of algorithms that can be run on such a system. Another recent metric that was introduced by Atos is called Q-score, which counts the number of variables in a max-cut problem that a device can optimize.

Along the same lines, the authors in this work, propose a quantum benchmark suite which serves as a comparison tool for the currently available and upcoming quantum computing platforms from the perspective of an end user. The objective is to analyze the performance of the available devices by providing meaningful benchmark scores for a series of different tests. The chosen benchmarks use numerical metrics including uncertainties which can characterize different noise aspects and can allow direct comparison of the performance gains between devices. The authors present six visual benchmarks with structured circuits; Bell Test, complex transformations of the Riemann sphere, Line Drawing, Quantum Matrix Inversion, Platonic and Fractals. All these benchmarks test different aspects of the quantum hardware such as gate fidelity, readout noise, and the ability of the compilers to take full advantage of the underlying device topology, yet in a more holistic approach than the metrics introduced so-far. In this way, the authors hope to offer more information than just 1 single-dimensional meta-parameter, still in a quick glance at a visual representation.

Testing of these benchmarks was performed on currently available quantum devices from Google, IBM and Rigetti using several frameworks such as SDKs and APIs (Qiskit / IBMQ for IBM, Forest / QCS and Amazon Braket for Rigetti, Amazon Braket for IonQ, and cirq for Google).

All the devices receive a numerical score for each of the implemented tests, which can be used in cross evaluating performances. Additionally, the performance of various NISQ devices is analyzed through a series of test scenarios and the proposed metrics. The overall analysis suggests that the proposed benchmarks can be readily implemented on 2-qubit devices with circuit depths < 10 as well as currently available small scale quantum devices. These benchmarks are envisioned to be tested for larger and more complex devices that will be available in the future, therefore exploration of the scalability of such metrics is also investigated.

The scores obtained from the experimental comparisons are then compared to the estimated ideal score based on a finite number of measurements. However, one should keep in mind that these measurements also include statistical errors, due to measurement noise, which is impossible to be eliminated completely. Nevertheless, the error margins presented in this work are shown to have “expected deviation” from the ideal score. This implies that the actual experimental error margins are in agreement with the error score estimates observed in simulated experiments. The authors also find their scores to correlate well with IBM’s Quantum Volume score, although individual cases vary still.

Another crucial factor to be analyzed, are the fluctuations that are observed while simulating experimental data over a period of time. This implies a change in device performance during the simulation time which in turn affects the estimated scores causing a time variance. However, the exact estimation of this variance requires more experimentation. It would be advantageous in future experimentation to explore the temporal inhomogeneities apart from encompassing statistical uncertainty in error margins. Such benchmarks can provide a holistic evaluation including time factor when it comes to comparing different quantum devices.

One potentially major aspect of a quantum performance metric is how widespread its use is. One may have a great metric but if nobody uses it, its usefulness is low. If a metric is used by everyone, but the metric itself is of low significance, the usefulness is equally low. We hope the community can converge on something comparable, fair, and standardized, but it may take some years before that happens in this rapidly fluctuating field.