Qiaomin Xie | Publications

2026

Optimal Regret for Policy Optimization in Average Reward MDPs Without Mixing

William Powell, Jeongyeol Kwon, Qiaomin Xie, and Hanbaek Lyu

In Reinforcement Learning Conference (RLC) 2026
Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

Tianhao Wu, Matthew Zurek, Weina Wang, and Qiaomin Xie

In Conference on Learning Theory (COLT) 2026
Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains

Yixuan Zhang, and Qiaomin Xie

In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems 2026

Best Student Paper
Best Paper Award Finalist

arXiv
Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive Stochastic Approximation

Yixuan Zhang, Dongyan (Lucy) Huo, Yudong Chen, and Qiaomin Xie

Operations Research 2026

arXiv
Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes

Dongyan Huo, Yudong Chen, and Qiaomin Xie

Mathematics of Operations Research 2026

arXiv

2025

Offline Actor-Critic for Average Reward MDPs

William Powell, Jeongyeol Kwon, Qiaomin Xie, and Hanbaek Lyu

In Advances in Neural Information Processing Systems (NeurIPS) 2025

URL
Contextual Online Pricing with (Biased) Offline Data

Yixuan Zhang, Ruihao Zhu, and Qiaomin Xie

In Advances in Neural Information Processing Systems (NeurIPS) 2025

arXiv
Unichain and aperiodicity are sufficient for asymptotic optimality of average-reward restless bandits

Yige Hong, Qiaomin Xie, Yudong Chen, and Weina Wang

Mathematics of Operations Research 2025

arXiv URL
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

Subhojyoti Mukherjee, Josiah P Hanna, Qiaomin Xie, and Robert Nowak

In Reinforcement Learning Conference (RLC) 2025

arXiv
Multi-task Representation Learning for Fixed Budget Pure-Exploration in Linear and Bilinear Bandits

Subhojyoti Mukherjee, Qiaomin Xie, and Robert Nowak

In Reinforcement Learning Conference (RLC) 2025

arXiv
Stable Offline Value Function Learning with Bisimulation-based Representations

Brahma S. Pavse, Yudong Chen, Qiaomin Xie, and Josiah P. Hanna

In International Conference on Machine Learning (ICML) 2025

arXiv
A Piecewise Lyapunov Analysis of Sub-quadratic SGD: Applications to Robust and Quantile Regression

Yixuan Zhang, Dongyan (Lucy) Huo, Yudong Chen, and Qiaomin Xie

In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems 2025

arXiv
Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way

Jeongyeol Kwon, Luke Dotson, Yudong Chen, and Qiaomin Xie

In International Conference on Artificial Intelligence and Statistics (AISTATS) 2025

arXiv
Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent

Xiang Li, and Qiaomin Xie

In AAAI Conference on Artificial Intelligence 2025

arXiv

2024

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

Dongyan (Lucy) Huo, Yixuan Zhang, Yudong Chen, and Qiaomin Xie

In Advances in Neural Information Processing Systems (NeurIPS), Spotlight 2024

arXiv
Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Yixuan Zhang, and Qiaomin Xie

In Reinforcement Learning Conference (RLC) 2024

arXiv
Inception: Efficiently Computable Misinformation Attacks on Markov Games

Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, and Qiaomin Xie

In Reinforcement Learning Conference (RLC) 2024

arXiv
Roping in Uncertainty: Robustness and Regularization in Markov Games

Jeremy McMahan, Giovanni Artiglio, and Qiaomin Xie

In International Conference on Machine Learning (ICML) 2024

arXiv
Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, and Qiaomin Xie

In International Conference on Machine Learning (ICML) 2024

arXiv
Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, and Josiah P. Hanna

In International Conference on Machine Learning (ICML) 2024

arXiv
Near-Optimal Stochastic Bin-Packing in Large Service Systems with Time-Varying Item Sizes

Yige Hong, Qiaomin Xie, and Weina Wang

In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems 2024

arXiv
Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive Stochastic Approximation

Yixuan Zhang, Dongyan (Lucy) Huo, Yudong Chen, and Qiaomin Xie

In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems 2024

arXiv
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

Subhojyoti Mukherjee, Qiaomin Xie, Josiah Hanna, and Robert Nowak

In International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

arXiv
Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

Emmanouil-Vasileios Vlatakis-Gkaragkounis, Angeliki Giannou, Yudong Chen, and Qiaomin Xie

In International Conference on Artificial Intelligence and Statistics (AISTATS), Oral 2024

arXiv
Data Poisoning to Fake a Nash Equilibrium in Markov Games

Young Wu, Jeremy McMahan, Xiaojin Zhu, and Qiaomin Xie

In AAAI Conference on Artificial Intelligence 2024

arXiv
Effectiveness of Constant Stepsize in Markovian LSA and Statistical Inference

Dongyan (Lucy) Huo, Yudong Chen, and Qiaomin Xie

In AAAI Conference on Artificial Intelligence 2024

arXiv
Exact Policy Recovery in Offline RL with Both Heavy-Tailed Rewards and Data Corruption

Yiding Chen, Xuezhou Zhang, Qiaomin Xie, and Xiaojin Zhu

In AAAI Conference on Artificial Intelligence 2024

URL
Optimal Attack and Defense for Reinforcement Learning

Jeremy McMahan, Young Wu, Xiaojin Zhu, and Qiaomin Xie

In AAAI Conference on Artificial Intelligence 2024

arXiv

2023

Multi-task Representation Learning for Pure Exploration in Bilinear Bandits

Subhojyoti Mukherjee, Qiaomin Xie, Josiah Hanna, and Robert Nowak

In Advances in Neural Information Processing Systems (NeurIPS) 2023

arXiv
Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Yige Hong, Qiaomin Xie, Yudong Chen, and Weina Wang

In Advances in Neural Information Processing Systems (NeurIPS), Spotlight, 2023

arXiv
Distributed Threshold-based Offloading for Heterogeneous Mobile Edge Computing

Xudong Qin, Qiaomin Xie, and Bin Li

In International Conference on Distributed Computing Systems (ICDCS) 2023

URL
Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes

Zihan Zhang, and Qiaomin Xie

In Conference on Learning Theory (COLT) 2023

arXiv
Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes

Dongyan (Lucy) Huo, Yudong Chen, and Qiaomin Xie

In ACM Sigmetrics 2023

arXiv
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, and Zhuoran Yang

Mathematics of Operations Research 2023

Abs arXiv URL

We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret. For both settings, we propose an optimistic variant of the least-squares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an O(d3H3T) upper bound on the duality gap and regret, where d is the linear dimension, H the horizon and T the total number of timesteps. Our results do not require additional assumptions on the sampling model. Our setting requires overcoming several new challenges that are absent in Markov decision processes or turn-based Markov games. In particular, to achieve optimism with simultaneous moves, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a general-sum matrix game with these bounds as the payoff matrices. As finding the Nash equilibrium of a general-sum game is computationally hard, our algorithm instead solves for a coarse correlated equilibrium (CCE), which can be obtained efficiently. To our best knowledge, such a CCE-based scheme for optimism has not appeared in the literature and might be of interest in its own right.
Reward Poisoning Attacks on Offline Multi-Agent Reinforcement Learning

Young Wu, Jermey McMahan, Xiaojin Zhu, and Qiaomin Xie

In AAAI Conference on Artificial Intelligence 2023

arXiv

2022

RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems

Bai Liu, Qiaomin Xie, and Eytan Modiano

ACM Transactions on Modeling and Performance Evaluation of Computing Systems 2022

Abs URL

With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called RL for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing, and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.
ORSuite: Benchmarking Suite for Sequential Operations Models

Christopher Archer, Siddhartha Banerjee, Mayleen Cortez, Carrie Rucker, Sean R. Sinclair, Max Solberg, Qiaomin Xie, and Christina Lee Yu

SIGMETRICS Performance Evaluation Review 2022

Abs URL

Reinforcement learning (RL) has received widespread attention across multiple communities, but the experiments have focused primarily on large-scale game playing and robotics tasks. In this paper we introduce ORSuite, an open-source library containing environments, algorithms, and instrumentation for operational problems. Our package is designed to motivate researchers in the reinforcement learning community to develop and evaluate algorithms on operational tasks, and to consider the true multi-objective nature of these problems by considering metrics beyond cumulative reward.
Nonasymptotic Analysis of Monte Carlo Tree Search

Devavrat Shah, Qiaomin Xie, and Zhi Xu

Operations Research 2022

URL

2021

Learning While Playing in Mean-Field Games: Convergence and Optimality

Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, and Andreea Minca

In International Conference on Machine Learning (ICML) 2021

URL
Zero queueing for multi-server jobs

Weina Wang, Qiaomin Xie, and Mor Harchol-Balter

In ACM Sigmetrics 2021

arXiv URL

2020

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, and Qiaomin Xie

In Advances in Neural Information Processing Systems (NeurIPS) 2020

arXiv URL
POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Weichao Mao, Kaiqing Zhang, Qiaomin Xie, and Tamer Basar

In Advances in Neural Information Processing Systems (NeurIPS) 2020

arXiv URL
Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, and Qiaomin Xie

In Advances in Neural Information Processing Systems (NeurIPS) 2020

arXiv URL
Stable Reinforcement Learning with Unbounded State Space

Devavrat Shah, Qiaomin Xie, and Zhi Xu

In Learning for Dynamics and Control (L4DC) 2020

arXiv
On Reinforcement Learning for Turn-based Zero-sum Markov Games

Devavrat Shah, Varun Somani, Qiaomin Xie, and Zhi Xu

In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference 2020

arXiv
Learning zero-sum simultaneous-move Markov games using function approximation and correlated equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, and Zhuoran Yang

In Conference on Learning Theory 2020

URL
Non-asymptotic analysis of Monte Carlo tree search

Devavrat Shah, Qiaomin Xie, and Zhi Xu

In ACM Sigmetrics 2020

URL
Greed works—online algorithms for unrelated machine stochastic scheduling

Varun Gupta, Benjamin Moseley, Marc Uetz, and Qiaomin Xie

Mathematics of operations research 2020

URL

2019

Reinforcement learning for optimal control of queueing systems

Bai Liu, Qiaomin Xie, and Eytan Modiano

In 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2019

URL

2018

Q-learning with nearest neighbors

Devavrat Shah, and Qiaomin Xie

In Advances in Neural Information Processing Systems (NeurIPS) 2018

arXiv URL

2017

Stochastic online scheduling on unrelated machines

Varun Gupta, Benjamin Moseley, Marc Uetz, and Qiaomin Xie

In International Conference on Integer Programming and Combinatorial Optimization 2017

URL
Centralized Congestion Control and Scheduling in a Datacenter

Devavrat Shah, and Qiaomin Xie

arXiv preprint arXiv:1710.02548 2017

arXiv

2016

Scheduling with Multi-level Data Locality: Throughput and Heavy-Traffic Optimality

Qiaomin Xie, Ali Yekkehkhany, and Yi Lu

In 2016 IEEE Conference on Computer Communications (INFOCOM) 2016

URL
Pandas: robust locality-aware scheduling with stochastic delay optimality

Qiaomin Xie, Mayank Pundir, Yi Lu, Cristina L Abad, and Roy H Campbell

IEEE/ACM Transactions on Networking 2016

URL

2015

Power of d Choices for Large-Scale Bin Packing: A Loss Model

Qiaomin Xie, Xiaobo Dong, Yi Lu, and R Srikant

In ACM Sigmetrics 2015

URL
Priority algorithm for near-data scheduling: Throughput and heavy-traffic optimality

Qiaomin Xie, and Yi Lu

In 2015 IEEE Conference on Computer Communications (INFOCOM) 2015

URL

2012

Degree-guided map-reduce task assignment with data locality constraint

Qiaomin Xie, and Yi Lu

In 2012 IEEE International Symposium on Information Theory Proceedings 2012

URL

2011

Join-idle-queue: A novel load balancing algorithm for dynamically scalable web services

Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R Larus, and Albert Greenberg

Performance Evaluation
International Symposium on Computer Performance, Modeling, Measurements, and Evaluation (IFIP Performance) 2011

Best Paper Award

URL