Abstract

Control of legged robots is a challenging problem that has been investigated by different approaches, such as model-based control and learning algorithms. This work proposes a novel Imitating and Finetuning Model Predictive Control (IFM) framework to take the strengths of both approaches.

Our framework first develops a conventional model predictive controller (MPC) using Differential Dynamic Programming and the Raibert heuristic, which serves as an expert policy. Then we train a clone of the MPC using imitation learning to make the controller learnable. Finally, we leverage deep reinforcement learning with limited exploration for further finetuning the policy on more challenging terrains.

By conducting comprehensive simulation and hardware experiments, we demonstrate that the proposed IFM framework can significantly improve the performance of the given MPC controller on rough, slippery, and conveyor terrains that require careful coordination of footsteps. We also showcase that IFM can efficiently produce more symmetric, periodic, and energy-efficient gaits compared to Vanilla RL with a minimal burden of reward shaping.

Learning Framework

The IFM framework consists of three stages. In the first stage, we develop a model-based controller using Model Predictive Control (MPC), which takes an expert role in the framework. In the next stage, we employ imitation learning to mimic this expert policy, utilizing DAgger to mitigate the distribution mismatch problem. Finally, we fine-tune this pre-trained policy using reinforcement learning. Since our primary focus in downstream tasks is on achieving robustness and command tracking ability, we train our policy using command tracking rewards and expose it to various terrains with a curriculum. The advantage of the IFM framework lies in its ability to train a policy that exhibits robust, symmetric, periodic, and energy-efficient motions with minimal reward tuning.

Simulation Demo

Rough Terrain
Slippery Terrain
Conveyor Terrain

MPC

IFM(SR)

In comparison to the expert MPC, the IFM controller demonstrates robust traversal of terrains, even in the presence of unexpected foot-trapping slopes.

MPC

IFM(SR)

Since IFM policy doesn't rely on Raibert's heuristic foothold selection, it can maintain stability even when the foot slips significantly, whereas our expert policy loses balance and falls.

MPC

IFM(SR)

Even when moving terrain changes the stance legs' positions relative to the robot's body, making heuristic-based foothold selection even harder, the IFM policy can recover from such an unexpected footstep.

Hardware Experiments

We have conducted hardware experiments using five different baselines and demonstrated that the IFM policy consistently outperforms its baselines in several key aspects, including command tracking, robustness, energy efficiency, and motion symmetry. In the following section, we provide sample motion videos showcasing command tracking. These videos illustrate how IFM can better perform in hardware motion, even when compared to the WBIC controller. Turn your sound on! to experience the distinct advantages of IFM - the sound of motors in action.

Hardware Compilation

This section includes all the hardware experiments featured in the paper.
Please note that a few videos had to be omitted due to recording issues.
Don't forget to turn on the sound!

MPC
DAgger
IFM(SR)
Vanilla RL(SR)
IFM(CR)
Vanilla RL(CR)
Concurrent RL

Command Tracking

Slip Test(fail)

Step Test(5cm fail)

Command Tracking

Slip Test(fail)

Step Test(5cm fail)

Command Tracking

Slip Test(success)

Step Test(5cm success)

Step Test(7.5cm success)

Step Test(7.5cm fail)

Command Tracking

Slip Test(success)

Step Test(5cm success)

Step Test(7.5cm fail)

Command Tracking

Slip Test(success)

Step Test(7.5cm success)

Command Tracking

Step Test(7.5cm success)

Command Tracking

Slip Test(success)

Step Test(5cm fail)

BibTeX

@article{youm2023imitating,
      title={Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion},
      author={Youm, Donghoon and Jung, Hyunyoung and Kim, Hyeongjun and Hwangbo, Jemin and Park, Hae-Won and Ha, Sehoon},
      journal={IEEE Robotics and Automation Letters},
      year={2023},
      publisher={IEEE}
    }