Simple k-armed Bandit UCB Viz
Number of Bandits:
When the visualization starts, one action per bandit has already been taken.
Click on the "Pull" text!
Red is the expected value
, a sample from a gaussian normal distribution with mean 0 and std dev 1
Green is the last reward
, a sample from a gaussian normal distribution with mean of the expected value and std dev 1
Blue is the average reward received
, with the light blue interval being the confidence interval.