This demo follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning, a paper from NIPS 2013 Deep Learning Workshop from DeepMind. The paper is a nice demo of a fairly standard (model-free) Reinforcement Learning algorithm (Q Learning) learning to play Atari games.
In this demo, instead of Atari games, we'll start out with something more simple: a 2D agent that has 9 eyes pointing in different angles ahead and every eye senses 3 values along its direction (up to a certain maximum visibility distance): distance to a wall, distance to a green thing, or distance to a red thing. The agent navigates by using one of 5 actions that turn it different angles. The red things are apples and the agent gets reward for eating them. The green things are poison and the agent gets negative reward for eating them. The training takes a few tens of minutes with current parameter settings.
Over time, the agent learns to avoid states that lead to states with low rewards, and picks actions that lead to better states instead.
It's very simple to use deeqlearn.Brain: Initialize your network:
var brain = new deepqlearn.Brain(num_inputs, num_actions);
And to train it proceed in loops as follows:
var action = brain.forward(array_with_num_inputs_numbers); // action is a number in [0, num_actions) telling index of the action the agent chooses // here, apply the action on environment and observe some reward. Finally, communicate it: brain.backward(reward); // <-- learning magic happens here
That's it! Let the agent learn over time (it will take opt.learning_steps_total), and it will only get better and better at accumulating reward as it learns. Note that the agent will still take random actions with probability opt.epsilon_min even once it's fully trained. To completely disable this randomness, or change it, you can disable the learning and set epsilon_test_time to 0:
brain.epsilon_test_time = 0.0; // don't make any random choices, ever brain.learning = false; var action = brain.forward(array_with_num_inputs_numbers); // get optimal action from learned policy
You can save and load a network from JSON here. Note that the textfield is prefilled with a pretrained network that works reasonable well, if you're impatient to let yours train enough. Just hit the load button!