RLC Logo RLC 2026

Epistemically-guided forward-backward exploration

Núria Armengol Urpí, Marin Vlastelica, Georg Martius, Stelian Coros

Exploration Thursday, August 7 Poster #34 Accepted — RLC 2025

Abstract

Zero-shot reinforcement learning is necessary for extracting optimal policies in absence of concrete rewards for fast adaptation to future problem settings.

Forward-backward representations ($FB$) have emerged as a promising method for learning optimal policies in absence of rewards via a factorization of the policy occupancy measure.

However, up until now, $FB$ and many similar zero-shot reinforcement learning algorithms have been decoupled from the exploration problem, generally relying on other exploration algorithms for data collection.

We argue that $FB$ representations should fundamentally be used for exploration in order to learn more efficiently.

With this goal in mind, we design exploration policies that arise naturally from the $FB$ representation that minimize the posterior variance of the $FB$ representation, hence minimizing its epistemic uncertainty.

We empirically demonstrate that such principled exploration strategies improve sample complexity of the $FB$ algorithm considerably in comparison to other exploration methods.