Rishi Kulkarni, PhD

Data Science | Biostatistics | Machine Learning

Bayesian Bandits | Rishi Kulkarni, PhD

Bayesian Bandits

December 24, 2023

bayesianbandits is a Python library that provides a simple, elegant interface for implementing Bayesian multi-armed bandit algorithms. Users simply define their action space, reward function, and a prior distribution, and the library takes care of the rest.

from bayesianbandits import Bandit, Arm, epsilon_greedy, DirichletClassifier

def reward_func(x):
    return x[..., 0]

class Agent(Bandit,
            learner=DirichletClassifier({"yes": 1.0, "no": 1.0}),
    arm1 = Arm("action 1", reward_func)
    arm2 = Arm("action 2", reward_func)

agent = Agent()

action = agent.pull() # receive an action token

# act on the action token, observe reward

agent.update("yes") # update with observed reward

The library supports contextual bandits, where additional information (or context) is available for decision making, restless bandits that change their reward probabilities over time, and bandits with delayed rewards where the outcome of an action may not be immediately available.

bayesianbandits allows users to create sophisticated reinforcement learning agents that can accumulate knowledge about different actions, or “arms”, and update beliefs based on the received rewards. The library comes bundled with a number of conjugate Bayesian models, including Bayesian linear regression using either Normal-Normal or Normal-Inverse-Gamma conjugate priors, or intercept-only Gamma-Poisson and Dirichlet-Multinomial models.

The library is useful in a variety of applications. For example, it can be leveraged to optimize click-through rates for an email newsletter. bayesianbandits is being used in production at IntelyCare!


The source code for bayesianbandits is available on GitHub.


bayesianbandits is compatible with joblib, a package widely used by scikit-learn to simplify model persistence. This compatibility makes it easy to store and retrieve learning agents for further use. Additionally, the library addresses memory management concerns by allowing the delayed_reward cache to be stored in any dict-like object, facilitating efficient on-disk storage instead of in-memory.



Installing bayesianbandits is straightforward using PyPi:

pip install bayesianbandits


Detailed user guide and documentation can be found at readthedocs.