ULC: A Unified and Fine-Grained Controller for Humanoid Loco-Manipulation

Abstract

Loco-Manipulation for humanoid robots aims to enable robots to integrate mobility with upper-body tracking capabilities. Most existing approaches adopt hierarchical architectures that decompose control into isolated upper-body (manipulation) and lower-body (locomotion) policies. While this decomposition reduces training complexity, it inherently limits coordination between subsystems and contradicts the unified whole-body control exhibited by humans.

We demonstrate that a single unified policy can achieve a combination of tracking accuracy, large workspace, and robustness for humanoid loco-manipulation. We propose the Unified Loco-Manipulation Controller (ULC), a single-policy framework that simultaneously tracks root velocity, root height, torso rotation, and dual-arm joint positions in an end-to-end manner, proving the feasibility of unified control without sacrificing performance. We achieve this unified control through key technologies: sequence skill acquisition for progressive learning complexity, residual action modeling for fine-grained control adjustments, command polynomial interpolation for smooth motion transitions, random delay release for robustness to deploy variations, load randomization for generalization to external disturbances, and center-of-gravity tracking for providing explicit policy gradients to maintain stability.

We validate our method on the Unitree G1 humanoid robot with 3-DOF (degrees-of-freedom) waist. Compared with strong baselines, ULC shows better tracking performance to disentangled methods and demonstrating larger workspace coverage. The unified dual-arm tracking enables precise manipulation under external loads while maintaining coordinated whole-body control for complex loco-manipulation tasks.

Method Overview

Method overview of the Unified Loco-Manipulation Controller (ULC). Our approach employs massively parallel reinforcement learning to train a single unified policy that tracks procedurally sampled commands including root velocity, root height, torso orientation, and arm joint positions. The framework addresses multi-task learning challenges through sequential skill acquisition with adaptive curricula, deployment-realistic command generation with smooth interpolation, and loaded balance optimization with center-of-mass tracking.

Picking and Placing (Doll)

Picking and Placing (Box)

Operating the Refrigerator

Opening the Door

Playing Ukulele

Pushing Cart

Operating the Microwave

Shoveling Sand

Using Paper Shredder

Wiping Table

Wiping Blackboard

Waist Orientation in the Wild

Pushing 1

Pushing 2

BibTeX

@misc{sun2025ulcunifiedfinegrainedcontroller,
      title={ULC: A Unified and Fine-Grained Controller for Humanoid Loco-Manipulation}, 
      author={Wandong Sun and Luying Feng and Baoshi Cao and Yang Liu and Yaochu Jin and Zongwu Xie},
      year={2025},
      eprint={2507.06905},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2507.06905}, 
}