Optimal discounting for offline input-driven MDP

Randy Lefebvre, Audrey Durand

Hierarchical RL, Planning Algorithms Wednesday, August 6 Poster #28 Accepted — RLC 2025

Abstract

Offline reinforcement learning has gained a lot of popularity for its potential to solve industry challenges. However, real-world environments are often highly stochastic and partially observable, leading long-term planners to overfit to offline data in model-based settings. Input-driven Markov Decision Processes (IDMDPs) offer a way to work with some of the uncertainty by letting designers separate what the agent has control over (states) from what it cannot (inputs) in the environnement. These stochastic external inputs are often difficult to model. Under the assumption that the input model will be imperfect, we investigate the bias-variance tradeoff under shallow planning in IDMDPs. Paving the way to input-driven planning horizons, we also investigate the similarity of optimal planning horizons at different inputs given the structure of the input space.

All 2025 Papers

Paper Details

Conference: RLC 2025
Presentation: Wednesday, August 6
Track: Hierarchical RL, Planning Algorithms
Poster: #28
Status: Accepted

Optimal discounting for offline input-driven MDP

Abstract

Paper Details

Cite This Paper

Quick Links