Incorporating Healthcare Motivated Constraints in Restless Multi-Armed Bandit Based Resource Allocation


As reinforcement learning plays an increasingly important role in healthcare, there is a pressing need to identify mechanisms to incorporate practitioner expertise. One notable case is in improving tuberculosis drug adherence, where a health worker must simultaneously monitor and provide services to many patients. We find that—without considering domain expertise—state-of-the-art restless multi-armed bandit algorithms allocate all resources to a small number of patients, neglecting most of the population. To avoid this undesirable behavior, we propose a human-in-the-loop model, where constraints are imposed by domain experts to improve equitability of resource allocations. Our framework enforces these constraints on the distribution of actions without significant loss of utility on simulations from real-world data.

NeurIPS 2020 Workshops

Appeared at the following NeurIPS 2020 Workshops:

  • Challenges of Real World Reinforcement Learning
  • Machine Learning in Public Health (Best Lightning Paper)
  • Machine Learning for Health (Best on Theme)
  • Machine Learning for the Developing World