Maze Problem with the conventional reinforcement learning method

Arm Problem with the conventional reinforcement learning method (help[japanese])

First, ...
Secondly, ....
In this conventional method, the agent relearns very solowly.
(Compare with the proposed method.)

(If this program is not executed, please install Java VM on here or here.)