Maze Problem with the conventional reinforcement learning method (help[japanese])
First, make an agent learn a route to the goal.
Secondly, choice emergency 0 or 1 as a new wall.
In this conventional method, the agent relearns very solowly.
(Compare with the proposed method.)

(If this program is not executed, please install Java VM on here or here.)