As reinforcement learning plays an increasingly important role in healthcare, there is a pressing need to identify mechanisms to incorporate practitioner expertise. One notable case is in improving tuberculosis drug adherence, where a health worker must simultaneously monitor and provide services to many patients. We find that—without considering domain expertise—state-of-the-art restless multi-armed bandit algorithms allocate all resources to a small number of patients, neglecting most of the population. To avoid this undesirable behavior, we propose a human-in-the-loop model, where constraints are imposed by domain experts to improve equitability of resource allocations. Our framework enforces these constraints on the distribution of actions without significant loss of utility on simulations from real-world data.
Appeared at the following NeurIPS 2020 Workshops: