Introduction

Over the past twenty years, the founding team of Bleu Robotics and their team from Inria/CNRS/Univ. Lorraine (HuCeBot team) have developed algorithms for humanoids that learn and adapt. A guiding principle throughout this work has been that learning should be applied where it makes a concrete difference for robotics, so that both model-based and data-driven approaches complete each other instead of competing.

This work is organized around three questions:

Whole-body control: how to synchronize 20-30 motors to both achieve the task and respect the constraints, like keeping the robot balanced?
Data-efficient learning: how to learn with very few trials or very few demonstrations, like 10-30 trials / a few minutes?
Humanoid learning: how to leverage these algorithms in humanoid robots?

Part I - Whole-Body Control for Humanoid Robots

The first challenge in humanoid robotics is getting the robot to move at all. A humanoid has 20 to 30 actuated degrees of freedom — hips, knees, ankles, spine, shoulders, elbows, wrists — and every motion requires synchronizing all of them simultaneously. Moving a hand to a desired position while keeping the robot balanced is not a simple inverse-kinematics problem; it demands reconciling dozens of coupled constraints in real time.

After many years of research, the humanoids community converged on whole-body control based on quadratic programming (QP). At each control cycle (200–1000 Hz) the controller solves an optimization problem:

$$ (\tau^, \ddot{q}^) = \underset{\tau, \ddot{q}}{\operatorname{argmin}} \sum_{k=0}^{n\_tasks} w_k \| A_k(q, \dot{q}) \ddot{q} - b_k(q, \dot{q}) \|^2 $$

$$ \text{s.t.} \begin{cases} A_{ineq}(q, \dot{q}) \ddot{q} \leq b_{ineq}(q, \dot{q}) \\ A_{eq}(q, \dot{q}) \ddot{q} = b_{eq}(q, \dot{q}) \\ \tau = M(q) \ddot{q} + h(q, \dot{q}) - J(q)^\top f \end{cases} $$

where $\tau^{}$ and $\ddot{q}^{}$ are the optimal joint torques and accelerations. $w_k$ is the weight of the task $k$ described by $A_k(q,\dot{q})$ and $b_k(q,\dot{q})$. $M(q)$ is the joint-space mass matrix, $h(q,\dot{q})$ contains all the non-linear terms (Coriolis, centrifugal and gravity), $J(q)$ is the contact Jacobian and $f$ the contact forces. Equality constraints are described by $A_{eq}(q,\dot{q})$ and $b_{eq}(q,\dot{q})$. Inequality constraints are described by $A_{ineq}(q,\dot{q})$ and $b_{ineq}(q,\dot{q})$. Tasks are weighted on the same level and their priority directly depends on $w_k$.

We get the optimal joint accelerations and integrate them twice to control the robot in position, although we could use the same values and control the robot directly in torque.

This QP architecture is a major achievement of model-based robotics, providing real-time, physically consistent control with hard safety guarantees. Our early humanoid work was concerned with making this framework more capable, more robust, and more practical to deploy.