In this paper, we consider planning to allocate resources in restless multi-armed bandits under budget and distribution constraints. Unfortunately, recently developed probabilistic approaches to this problem rely on an assumption of convergence to avoid issues with intractability. To improve state-of-the-art techniques, it is critical that we relax this assumption. To this end, we introduce a novel non-stationary approach that maximizes expected reward given sufficiently large epochs.
Preprint available upon request.