What is this project?

mdp.ai is a tool that generates MDPs (Markov Decision Proccesses) for research and education in RL (Reinforcement Learning).

MDPs are used to represent decision making and the outcome of decisions of the real world. RL is an area of Machine Learning focused on having algorithms learn to do things well, like having a robot learn the fastest way out of a maze after repeated attempts of getting out.

What is the tool?

It exists as a generator site called the Playground...

The Playground lets you create, save, and visualize MDPs with a few clicks.

...and as a Javascript library meant to make generating and visualizing MDPs incredibly easy. The MDP on the left is created by the code on the right:

How is this tool useful for RL?

Research

The tool facilitates the robust testing of policy evaluation algorithms by making it easy to create, reproduce, and visualize randomly generated MDPs. These finite, fully known environments are quick to solve for, easy to benchmark, and provide additional diversity for testing. Currently, the tool can:

  • Generate MDPs with different generators and parameters
  • Visualize any MDP(s) when provided state-action-state and reward matrices via a text field
  • Visualize state values, transition probabilities, optimal policies: each MDP is auto-solved via value iteration
  • Help debug and benchmark Python agents (using an interface similar to OpenAI Gym) by running and visualizing instances of it through the MDPs
  • Save and load any one or full set of MDPs with a click
  • Show charts of indicators like returns for the agent running through each MDP

Education

For now, you can compare between different generator algorithms and visually compare their value functions in the playground.

This tool can help make RL concepts more accessible to the public. Imagine an interactive tutorial that both explains and shows the concept of actions and transition probabilities. You can extend this to showing the agent experience and visualize the decision-making for algorithms like Q-learning.

This is relevant to me: I have some feedback and/or want your help.

Drop me an email at eugene@ideaowl.com

Can I help you?

I'd love your help, thanks! I'm sure we'll find something, small or large, that works. Here are some of the larger next steps:

  • Write a Python library that reproduces the environment from this tool's save files
  • Make the MDPs look more similar to existing MDP graphs (curved lines, labels, etc)
  • Add descriptor pages for generator algorithms, add agent ranks and returns

About the project

This started as an idea for an independent study class during my studies at the University of Alberta. I had been taking a course on Reinforcement Learning taught by Rich Sutton and Adam White, and got the idea of creating this tool towards the end of the term.

There's a lot of people to thank about this project, but I'm going to hold off until this feels a bit more polished before it's tied to the amazing folks that were involved.