Reinforcement Learning Class Project: Learning Gait Trajectories for a Quadrupedal Robot
This work was completed in Spring 2021 by Alexander Krolicki and Sarang Sutavani in ME8930 Reinforcement Learning, taught by Dr. Umesh Vaidya.
Final Project Report | Final Project Presentation | TRPO Training Video | Code |
Abstract:
The focus of this paper is to provide insight regarding the importance of policy gradient methods when applied to continuous systems. We study the development of policy gradient algorithms, and explain when and how policy gradients should be applied or when they are inadequate for the given problem. We specifically investigate 2 well known algorithms which utilize policy gradient methods, Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). To better demonstrate in practice how these algorithms would need to be implemented, we choose to utilize a simulation environment to train a quadrupedal robot to learn a gait planning and control policy. We demonstrate the use of developmental tools which help to expedite training, hyper-parameter tuning, and model analysis. Altogether we present a comparison of these results, and provide qualitative commentary on the behaviors learned by the quadrupedal robot over time between the two algorithms.
Figure 1. “Minitaur Robot” Transfer of control policy from simulation to physical robot [1].
References:
[1] Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332, 2018.