This process is known as reinforcement learning. Over time, AlphaGo improved and became increasingly stronger and better at learning and decision-making. Then we had it play against different versions of itself thousands of times, each time learning from its mistakes. We introduced AlphaGo to numerous amateur games to help it develop an understanding of reasonable human play.
The other neural network, the “value network”, predicts the winner of the game. One neural network, the “policy network”, selects the next move to play. These neural networks take a description of the Go board as an input and process it through a number of different network layers containing millions of neuron-like connections. We created AlphaGo, a computer program that combines advanced search tree with deep neural networks.
To capture the intuitive aspect of the game, we needed a new approach.