Abstract

Amidst task-specific learning-based control synthesis frameworks that achieve impressive empirical results, a unified framework that systematically constructs an optimal policy for sufficiently solving a general notion of a task is absent. Hence, we propose a theoretical framework for a task-centered control synthesis leveraging two critical ideas: 1) oracle-guided policy optimization for the non-limiting integration of sub-optimal task-based priors to guide the policy optimization and 2) task-vital multimodality to break down solving a task into executing a sequence of behavioral modes. The proposed approach results in highly agile parkour and diving on a 16-DoF dynamic bipedal robot. The obtained policy advances indefinitely on a track, performing leaps and jumps of varying lengths and heights for the parkour task. Corresponding to the dive task, the policy demonstrates front, back, and side flips from various initial heights. Finally, we introduce a novel latent mode space reachability analysis to study our policies’ versatility and generalization by computing a feasible mode set function through which we certify a set of failure-free modes for our policy to perform at any given state.

Proposed Approach

Overview of OGMP: a) The breakdown of a task into its principal modes with a mode and mode parameter set b) Guided exploration by constraining the search space around the local neighborhood of the oracle’s reference c) Mode encoder: an LSTM autoencoder trained on a custom modal dataset by minimizing reconstruction loss d) Multimodal policy trained with oracle guided policy optimization on a task environment e) The closed-loop inference pipeline with the high-level oracle and the low-level multimodal policy

OGMP: Oracle Guided Multimodal Policies for Agile and Versatile Robot Control

Lokesh Krishna Nikhil Sobanbabu Quan Nguyen

Abstract

Proposed Approach

Dive Task

Oracle guided Control

Comparison with reference trajectory

Preview Horizon variation

Horizon length: 0.21sec

Very low horizon length leads to failure in solving the task

Horizon length: 0.45sec

Mypoic characteristics leading to block-ground-block transitions

Horizon length: 0.9sec

High horizon length leads to optimal block-block transisitons

Task Parameter variation

Block parameter variations

Gap parameter variations

Latent mode-space test