RLC Logo RLC 2026

Accepted Papers — RLC 2025

123 accepted papers

Friday, August 8

Multi-Agent RL

Reinforcement Learning for Finite Space Mean-Field Type Game

Kai Shao, Jiacheng Shen, Mathieu Lauriere

#25

Collaboration Promotes Group Resilience in Multi-Agent RL

Ilai Shraga, Guy Azran, Matthias Gerstgrasser, Ofir Abu, Jeffrey Rosenschein, Sarah Keren

#26

Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models

Aaron Dharna, Cong Lu, Jeff Clune

#27

Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense

Aditya Vikram Singh, Ethan Rathbun, Emma Graham, Lisa Oakley, Simona Boboila, Peter Chin, Alina Oprea

#28

Efficient Information Sharing for Training Decentralized Multi-Agent World Models

Xiaoling Zeng, Qi Zhang

#29

Adaptive Reward Sharing to Enhance Learning in the Context of Multiagent Teams

Kyle Tilbury, David Radke

#30

Seldonian Reinforcement Learning for Ad Hoc Teamwork

Edoardo Zorzi, Alberto Castellini, Leonidas Bakopoulos, Georgios Chalkiadakis, Alessandro Farinelli

#31

Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control

Justin Turnau, Longchao Da, Khoa Vo, Ferdous Al Rafi, Shreyas Bachiraju, Tiejin Chen, Hua Wei

#32

TransAM: Transformer-Based Agent Modeling for Multi-Agent Systems via Local Trajectory Encoding

Conor Wallace, Umer Siddique, Yongcan Cao

#33

PEnGUiN: Partially Equivariant Graph NeUral Networks for Sample Efficient MARL

Joshua McClellan, Greyson Brothers, Furong Huang, Pratap Tokekar

#34

Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Jake Grigsby, Yuqi Xie, Justin Sasek, Steven Zheng, Yuke Zhu

#35

Thursday, August 7

Exploration

Uncertainty Prioritized Experience Replay

Rodrigo Antonio Carrasco-Davis, Sebastian Lee, Claudia Clopath, Will Dabney

#25

Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget

Dengwang Tang, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

#26

Quantitative Resilience Modeling for Autonomous Cyber Defense

Xavier Cadet, Simona Boboila, Edward Koh, Peter Chin, Alina Oprea

#27

Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization

Sebastian Griesbach, Carlo D'Eramo

#28

Syllabus: Portable Curricula for Reinforcement Learning Agents

Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson

#29

Exploration-Free Reinforcement Learning with Linear Function Approximation

Luca Civitavecchia, Matteo Papini

#30

Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning

Abdul Wahab, Raksha Kumaraswamy, Martha White

#31

Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World

Akhil Bagaria, Anita De Mello Koch, Rafael Rodriguez-Sanchez, Sam Lobel, George Konidaris

#32

An Optimisation Framework for Unsupervised Environment Design

Nathan Monette, Alistair Letcher, Michael Beukman, Matthew Thomas Jackson, Alexander Rutherford, Alexander David Goldie, Jakob Nicolaus Foerster

#33

Epistemically-guided forward-backward exploration

Núria Armengol Urpí, Marin Vlastelica, Georg Martius, Stelian Coros

#34

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

#35

Wednesday, August 6

RL Algorithms

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes

Juan Sebastian Rojas, Chi-Guhn Lee

#1

RL³: Boosting Meta Reinforcement Learning via RL inside RL²

Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein

#2

Fast Adaptation with Behavioral Foundation Models

Harshit Sikchi, Andrea Tirinzoni, Ahmed Touati, Yingchen Xu, Anssi Kanervisto, Scott Niekum, Amy Zhang, Alessandro Lazaric, Matteo Pirotta

#3

Understanding Learned Representations and Action Collapse in Visual Reinforcement Learning

Xi Chen, Zhihui Zhu, Andrew Perrault

#4

Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions

Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J Lim

#5

ProtoCRL: Prototype-based Network for Continual Reinforcement Learning

Michela Proietti, Peter R. Wurman, Peter Stone, Roberto Capobianco

#6

Offline Reinforcement Learning with Domain-Unlabeled Data

Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, Masashi Sugiyama

#7

SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning

Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, Andrew D. Bagdanov

#8

Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps

Motoki Omura, Yusuke Mukuta, Kazuki Ota, Takayuki Osa, Tatsuya Harada

#9

Zero-Shot Reinforcement Learning Under Partial Observability

Scott Jeen, Tom Bewley, Jonathan Cullen

#10

Adaptive Submodular Policy Optimization

Branislav Kveton, Anup Rao, Viet Dac Lai, Nikos Vlassis, David Arbour

#11

RL from Human Feedback, Imitation Learning

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos

#13

Nonparametric Policy Improvement in Continuous Action Spaces via Expert Demonstrations

Agustin Castellano, Sohrab Rezaei, Jared Markowitz, Enrique Mallada

#14

DisDP: Robust Imitation Learning via Disentangled Diffusion Policies

Pankhuri Vanjani, Paul Mattes, Xiaogang Jia, Vedant Dave, Rudolf Lioutikov

#15

Mitigating Goal Misgeneralization via Minimax Regret

Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, Michael D Dennis

#16

Modelling human exploration with light-weight meta reinforcement learning algorithms

Thomas D. Ferguson, Alona Fyshe, Adam White

#17

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

Calarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. Taylor

#18

PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning

Ondrej Bajgar, Dewi Sid William Gould, Jonathon Liu, Alessandro Abate, Konstantinos Gatsis, Michael A Osborne

#19

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets

Alexander Levine, Peter Stone, Amy Zhang

#20

One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise

Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall

#21

Goals vs. Rewards: A Comparative Study of Objective Specification Mechanisms

Septia Rani, Serena Booth, Sarath Sreedharan

#22