Curry is a Sokoban solver based on curriculum learning. It is inspired by the paper “Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning” by Dieqiao Feng, Carla Gomes, and Bart Selman.
At the moment Curry is NOT a strong Sokoban solver. It is an interesting demonstration of the concept of a Sokoban solver with zero Sokoban knowledge.
Curry is very different from other Sokoban solvers because it doesn’t use:
- any human designed features (such as distance to targets)
- any human designed heuristics (such as computation of a packing order)
- any deadlock detectors
- a backward search component
The only thing that Curry knows is the winning condition of the game (all boxes on targets). Yet, it is able to solve several levels, including some tricky ones.
Motivation
Feng’s solver requires a high-end machine with 5 GPUS running for 24 hours (120 GPU hours for solving a Sokoban level).
Curry is designed to run on a single core, at a computational cost that is about 1000 times smaller. In a time limit of one hour Curry can solve 71 XSokoban levels.
In addition, Curry’s source code is available for download.
Curriculum learning
Feng suggests an approach in which Sokoban levels are solved by first solving easier subcases with fewer boxes. The solutions to these easier instances are used to train deep neural networks that predict likely moves and their values. The solver starts by solving subcases with 2 boxes, and gradually increases the difficulty (number of boxes) until it finally tackles the original problem.
While Feng’s solver uses deep neural networks, Curry relies on simpler data structures. When solving a subcase with K boxes, the program is given access to a buffer of episodes solving K-1 subcases. Each episode in this buffer returns the move that was played in a similar position. Moves are then selected with probability proportional to their frequency in the past episodes.
Example
Consider a variant of XSokoban #1 with less boxes:
The moves that solve the 5-boxes variant are often useful when solving the full 6-boxes problem. With several episodes of 5 boxes, we can hope that they share many of the “good” moves and that such moves will have high probabilities.
Additional details
More details about the program and algorithm can be found in this research document.
Can Curry solve hard Sokoban levels?
No.
I still haven’t found a level that Curry solves and Festival misses. In addition, Festival is much, much faster. However, this comparison is unfair: Festival has a LOT of Sokoban knowledge in the form of high-level features, heuristics and deadlock detectors. Curry works without any Sokoban knowledge (*) except for the move generator.
(*) The move generator uses PI-Corral pruning, which involves some Sokoban knowledge. It also prunes moves which are immediate deadlocks, such as pushing boxes to corners.
But the neural networks solve hard levels?
Perhaps the deep neural networks approach is better. Recall, however, that their computational cost is about 1000 times higher. It is possible that with similar conditions Curry would be stronger. Note how Curry scales as more time is added; in a 24 hours time limit it can solve all XSokoban levels!
Download Curry
The zip file includes the executable (for windows) and its source code in C. The source code is mostly based on Festival but without any Sokoban knowledge.
I hope that new improvements to the program can be found.