The 1st International Workshop on Software Frameworks for Integrating Quantum and HPC Ecosystems

When

Jun 8, 2025

Where

Warnock Engineering Building, University of Utah

Workshop Program

Morning Session

Afternoon Session

Tutorial/Hands-on Session:

======================================================================================================================================================

Title & Abstract

Preparing for Scalable Quantum Computing at NERSC - Katherine Klymko (NERSC/LBL)

    Co-authors: Anastasiia Butko, Doru Thom Popovici, Daan Camps, and Nicolas Sawaya

    In this talk, I will present NERSC’s efforts to develop scalable tools for quantum benchmarking and simulation. I will focus on HamLib, a dataset we created to support the characterization and evaluation of quantum hardware, software, and algorithms. HamLib provides qubit-mapped problem instances for a wide range of applications, including the Heisenberg model, Fermi-Hubbard model, Bose-Hubbard model, molecular electronic and vibrational structure, MaxCut, Max-k-SAT, Max-k-Cut, QMaxCut, and the traveling salesperson problem. This effort aims to: save researchers time by eliminating the need to generate and encode problem instances; enable more comprehensive testing of quantum algorithms and devices; and promote reproducibility and standardization across studies. I will also describe our work on developing high-performance state vector simulators, focusing on strategies for distributing the quantum state to minimize communication and memory overhead. Our implementation transforms quantum circuits into highly optimized code that runs efficiently on supercomputers such as Perlmutter, Frontier, and Fugaku, offering critical insights into the performance and scalability of quantum algorithms in simulated environments.

Q-IRIS: The Evolution of the IRIS Task-Based Runtime to Enable Classical-Quantum Workflows - Elaine Wong (ORNL)

    Co-authors: Narasinga Rao Miniskar, M.A.H. Monil, Vicente Leyton Ortega, Seth Johnson, Jeff Vetter, and Travis Humble

    Heterogeneous systems are ubiquitous and are growing to include even more diverse paradigms, such as quantum computing. This talk will describe an asynchronous task-based runtime solution that can encapsulate both classical and quantum computing environments in the heterogeneous execution paradigm by exploring a few integration possibilities between the task-based runtime IRIS, and the Quantum Intermediate Representation Execution Engine (QIR-EE). The need for asynchronous task-based execution is motivated by examples that require the coexistence of classical and quantum computing hardware. To show a proof-of-concept for motivating future study, we describe the principles of integrating quantum runtimes with a task-based runtime and demonstrate its capability of parallelizing quantum circuit execution by decomposing a four-qubit circuit into a collection of smaller circuits, which lowers the quantum simulation load during execution. We hope that this will further highlight challenges that we would need to overcome to make such a solution effectively scalable while simultaneously capturing classical-quantum and quantum-quantum interactions.

QUANTUMX: A Few-Shot-Learning Approach for Quantum Performance Modeling in HPC - Arunavo Dey (Texas State Univ.)

    Co-authors: Jae-Seung Yeom and Tanzima Islam

      Hybrid quantum systems integrate quantum subsystems, such as QPUs, with classical high-performance computing (HPC) components including CPUs and GPUs. Quantum processors execute quantum-specific subroutines, while classical devices manage control flow, preprocessing, and aggregation. Such task partitioning is critical since current quantum hardware remains constrained by qubit count, quantum state, high gate and readout error rates. Hence, classical processors such as CPUs and GPUs are used to perform computational tasks that scale with problem size and memory requirements. Mixed hybrid systems that couple both quantum–classical systems and multi-qubit-type hybrid systems are increasingly being explored in research lately.

      However, coupling these systems introduces a new kind of complexity. It’s not just about assigning tasks—it’s about predicting how delays and noise ripple across the stack. It is not enough to simply assign tasks based on hardware capabilities. Performance also depends on how delays and errors propagate across the system. On the quantum side, phase noise—such as ZZ crosstalk—can degrade fidelity even when qubits are idle, due to unintended interactions with neighboring qubits. On the classical side, memory-bound GPU stalls or CPU scheduling delays can block or desynchronize quantum operations. These effects are not isolated; they interact. As a result, total system throughput not only depends on task placement, but also how well those interactions can be modeled. While middleware such as Pilot-Quantum [1] orchestrates execution across CPUs, GPUs, and QPUs, their effectiveness depends on accurate runtime prediction.

      To aid such job orchestrators in reducing queue times and improving runtime predictions, many recent works have investigated machine learning (ML) methods. For instance, BOSER [2] integrates ensemble models (e.g., XGBoost, LightGBM) into SLURM to forecast job durations, while GNN-RL [3] uses graph neural networks trained on historical job graphs to adapt to changing workloads. For quantum-specific scheduling, particularly under ZZ crosstalk, existing approaches combine hardware calibration, circuit timing heuristics, and ML. Lu et al. [5] introduce a physics-informed method for crosstalk-aware mapping in frequency-tunable qubits. Wang et al. [6] employ a graph transformer to predict circuit fidelity based on gate-level noise annotations.

      However, these methods face two limitations. First, they require extensive training data and frequent retraining as workloads drift—often using hundreds or thousands of historical entries [4]. Second, these models do not generalize well across architectures. That is, a model trained on one platform typically underperforms on another. For instance, QuEst [6] lacks explicit crosstalk features and is trained separately for each device backend (e.g., IBM Geneva, IBM Hanoi), limiting its portability. To fill this gap, one option is to retrain performance models for every platform and workload from scratch. However, such a solution can be expensive, especially as new hardware emerges every six months. A more scalable alternative is to adopt few-shot transfer learning. We propose quantumX, a framework that can adapt across CPUs, GPUs, and QPUs using minimal support data to jointly predict runtime and crosstalk risks

      Specifically, we propose quantumX based on a recent work by Dey et al. called ModelX [7]. quantumX aligns heterogeneous performance features and learns a base model on source data, then fits a lightweight residual model using only a few target-domain samples. For example, a runtime model trained on QPU data from Vendor A can inform predictions on a QPU on vendor B by aligning features (e.g. qubit count, circuit depth) and then scaling appropriately. quantumX’s few-shot adaptation means only ~10 measured runs on the new device are needed to calibrate its residual component. In practice this yields more accurate runtime and resource forecasts than naïve heuristics, enabling better scheduling decisions. Unlike prior ML schedulers that require thousands of examples and retraining when hardware changes, quantumX handles heterogeneous feature sets with no feature matching needed, transferring common performance patterns across devices.

      QuantumX can also model crosstalk effects. By incorporating features such as “simultaneously active qubit pairs,” “spectator qubit proximities,” or “concurrent gate counts,” quantumX aligns these high-level inputs across devices. When adapting from Device A to B, the base model (trained on many calibration runs of A) captures the general dependence of performance on parallel gate activity, while the few-shot residual learns device-specific ZZ offsets. For instance, executing a handful of parallel-circuit benchmarks on Device B lets the residual model quantify extra error from its unique ZZ coupling. This calibrated model then predicts the error or fidelity of larger circuits on B without exhaustive characterization.

      Such predictive models can benefit schedulers i.e. for each queued quantum or hybrid job, quantumX can predict runtime and crosstalk risk based on job features and current device state. The scheduler (BOSER/SLURM) uses these aligned forecasts to make smarter placement and ordering decisions. In simulations on real job traces, quantumX -driven scheduling reduced average turnaround time by over 70% compared to heuristic baselines. This cross-domain prediction is integrated into the scheduling loop: the scheduler queries ModelX (alignment + residual modules) for each candidate schedule, scoring schedules by predicted execution time and crosstalk cost, then selects the minimal-cost option.

      In summary, quantumX applies a transfer learning approach to predict runtime and crosstalk behavior across heterogeneous quantum-classical systems. Unlike existing models that require retraining for each hardware platform or rely on architecture-specific heuristics, quantumX generalizes across domains using few-shot adaptation. quantumX can be integrated into middleware frameworks such as Pilot-Quantum to enhance their ability to make noise- and resource-aware decisions more accurately. By linking performance modeling with device-level noise characteristics, quantumX provides a foundation for effective orchestration in the noisy intermediate-scale quantum regime.

      1. Pradeep Mantha, Florian J. Kiwit, Nishant Saurabh, Shantenu Jha, and Andre Luckow. 2024. Pilot-Quantum: A Quantum-HPC Middleware for Resource, Workload and Task Management. arXiv preprint arXiv:2412.18519

      2. Mahdi Rezaei, Alexey Salnikov. PCT 2025. A Machine Learning-Based Plugin for SLURM: Improving Job Scheduling through Ensemble Learning.

      3. Kyrian C. Adimora and Hongyang Sun. 2024. GNN-RL: An Intelligent HPC Resource Scheduler. In SC24: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 17–22, 2024, Atlanta, GA, USA. Poster.

      4. Menear, Kevin, Ambarish Nag, Jordan Perr-Sauer, Monte Lunacek, Kristi Potter, and Dmitry Duplyakin. 2023. Mastering HPC Runtime Prediction: From Observing Patterns to a Methodological Approach. In Practice and Experience in Advanced Research Computing (PEARC '23), July 23–27, 2023, Portland, OR, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3569951.3593598.

      5. Mingyu Wu, Lin Gan, and Haohuan Fu. 2023. CAMEL: Crosstalk-Aware Mapping and gatE scheduLing for Frequency-Tunable Quantum Chips. arXiv preprint arXiv:2311.18160.

      6. Ruizhe Li, Anand Venkat, and Yunong Shi. 2022. QuEst: Graph Transformer for Quantum Circuit Reliability Estimation. arXiv preprint arXiv:2210.16724.

      7. A. Dey, N. Antony, A. Dhakal, K. Thopalli, J. Thigarajan, T. Patki, T. Scogland, J. S. Yeom, and T. Z. Islam, “ModelX: A Novel Transfer Learning Approach Across Heterogeneous Datasets”. The 34th ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2025 (Accepted).

    A Scalable Software Framework for Distributed Quantum-HPC Integration: Resource-Oriented Orchestration and Workflow Optimization - Kuan-Cheng (Louis) Chen (Imperial College London)

      Co-authors: Felix Burt and Kin K. Leung

      The convergence of quantum computing and high-performance computing (HPC) presents a unique opportunity to accelerate scientific discovery through hybrid quantum-classical workflows. However, the integration of quantum processing units (QPUs) into distributed HPC environments remains hindered by architectural heterogeneity, limited qubit connectivity, and the absence of unified software abstractions for resource coordination. In this work, we propose a scalable software framework for orchestrating distributed quantum-HPC workflows, with a focus on dynamic resource management, entanglement-aware scheduling, and workload decomposition across quantum and classical compute nodes. The framework introduces a unified resource abstraction layer, a containerized microservice architecture, and a policy-driven scheduler capable of adapting to latency, coherence constraints, and QPU topology. We demonstrate the effectiveness of the system on representative hybrid workloads, including variational quantum eigensolvers and quantum approximate optimization algorithms, deployed over simulated and real-world HPC environments. Our results show significant improvements in execution efficiency, resource utilization, and scalability compared to baseline orchestration models. This framework provides a foundational step toward operationally viable quantum-HPC integration and lays the groundwork for future quantum-enabled HPC systems.

    Hybrid Quantum-Classical Architecture for Large Language Model Fine-Tuning: Toward Hybrid CPU + GPU + QPU - Claudio Girotto (IonQ)

      Co-authors: Erica Stump, Sang Hyub Kim, Jonathan Mei, Masako Yamada, and Martin Roetteler

      We introduce a hybrid quantum-classical architecture for language model fine-tuning, and present a screening study exploring the effects of varying hyperparameters. The research investigates how prediction task accuracy scales with both quantum circuit depth and the number of qubits (or circuit width). We observe a notable trend of increased accuracy with more qubits, while circuit depth shows a width-dependent optimal point that seems to balance expressive power and trainability. To overcome the limitations of simulating complex-valued statevectors on GPUs, which restricts both the number of qubits and circuit depth, we devised a design-of-experiment approach. This approach aims to reach those hyperparameter settings beyond the capabilities of our private cluster by using supercomputers (e.g., ORNL Frontier). The final step is to validate these findings through performing both inference and training on actual quantum hardware at circuit depth and qubit count beyond what is classically simulable on supercomputers. Because our algorithm is hybrid in nature, tight integration of the QPU with the CPU and GPU for classical processing during end-to-end training would yield speed and energy benefits, especially beyond these simulable limits.

    Assessing the Variational Quantum Linear Solver for Fluid Dynamics on a Hybrid Quantum-HPC Stack - Chao Lu (ORNL)

      Co-authors: Muralikrishnan G. Meena, Eduardo Antonio Coello Pérez, Amir Shehata, Seongmin Kim, and In-Saeng Suh

      Abstract: Fluid dynamics remains one of the most demanding challenges in scientific computing, with the solution of large linear systems forming a core component of most numerical methods. Recent advances in quantum algorithms for linear systems offer a promising direction for accelerating such computations. However, the deep and complex circuits required by many quantum algorithms limit their practical use on current quantum hardware. The Variational Quantum Linear Solver (VQLS) presents a viable alternative for near-term quantum devices (NISQ), and initial efforts have explored its application to select fluid dynamics problems.

      In this work, we evaluate the use of VQLS for canonical fluid dynamics problems, aiming to identify pathways for generalizing its implementation across a broader class of systems. We analyze the impact of various circuit ansatz and classical optimizers on solution quality and convergence behavior. Furthermore, we assess the algorithm’s feasibility within a hybrid quantum–high-performance computing (HPC) framework by porting it to QFw, a state-of-the-art quantum-HPC software stack. Our performance study explores practical scalability, identifies current bottlenecks, and provides insights into integrating VQLS into hybrid workflows. These results establish a foundation for extending VQLS to more complex fluid dynamics applications and advancing hybrid quantum-HPC strategies.

    An Early Investigation of the HHL Quantum Linear Solver for Scientific Applications - Muqing Zheng (PNNL)

      Co-authors: Chenxu Liu, Samuel Stein, Xiangyu Li, Johannes Mülmenstädt, Yousu Chen, and Ang Li

      We explore using the Harrow-Hassidim-Lloyd (HHL) algorithm to address scientific and engineering problems through quantum computing utilizing the NWQSim simulation package on high-performance computing. Focusing on domains such as power-grid management and heat transfer problems, we demonstrate the correlations of the precision of quantum phase estimation, along with various properties of coefficient matrices, on the final solution and quantum resource cost in iterative and non-iterative numerical methods such as Newton-Raphson method and finite difference method, as well as their impacts on quantum error correction costs using Microsoft Azure Quantum resource estimator. We conclude the exponential resource cost from quantum phase estimation before and after quantum error correction and illustrate a potential way to reduce the demands on physical qubits. This work lays down a preliminary step for future investigations, urging a closer examination of quantum algorithms’ scalability and efficiency in domain applications.

    Overview of Distributed Variational Optimization Algorithms on Large-scale Quantum-HPC Ecosystems - Seongmin Kim (ORNL)

      In this tutorial, we provide an overview of distributed variational optimization algorithms designed to harness the power of integrated quantum–HPC ecosystems for solving large-scale combinatorial optimization problems. We focus on the Distributed Quantum Approximate Optimization Algorithm (DQAOA), a scalable quantum-classical hybrid algorithm that distributes quantum workloads across multiple QPUs or simulators, coordinated via classical HPC infrastructure.

      A key of the tutorial is the application of DQAOA to materials optimization problems, which are naturally formulated as large and densely connected quadratic unconstrained binary optimization (QUBO) problems. These QUBO instances often exceed the capacity of current quantum hardware or simulator. We will present real-world case studies involving high-dimensional materials design problems, showcasing how distributed quantum resources can accelerate the search for optimal material configurations.

    Quantum+HPC Utility-scale Algorithms - Mirko Amico (IBM)

      Co-authors: Vincent Pascuzzi

      The workshop covers the latest quantum+HPC algorithms demonstrated by IBM Quantum with hands-on material to get started using these algorithms. In particular, Sample-based Quantum diagonalization (SQD), Krylov Quantum Diagonalization (KQD) and Sample-based Krylov Quantum Diagonalization (SKQD) have been demonstrated as effective quantum algorithms for near-term demonstration on quantum+HPC. The different algorithms have different assumptions and use-cases which makes them complementary. In the workshop we will explore both their theory and implementations.

    ORNL Quantum Software Stacks Overview and Demonstration - Amir Shehata (ORNL)

      We will present an overview of the ORNL Quantum Software Stack (O-QSS), which organizes the quantum software ecosystem into distinct abstraction layers. This layered approach enables quantum applications to interface with the stack without requiring detailed knowledge of the underlying hardware. Key components include the application-facing Quantum Programming Interface (QPI), the platform-facing Quantum Platform Manager API (QPM), and a flexible tool pipeline for integrating circuit

    Workshop supported by OLCF, ORNL