Isaac Sim · Unitree G1 · Fernando Suarez

§ 01 / Training reel

Gen 0 → 1499.

PROGRESSION REEL · gen 0 random flail → gen 300 balance → gen 700 small steps → gen 1499 shuffle-tracking · 4 parallel envs · headless tiled renderer

§ 02 / Why

Closing the loop myself.

My day-job is mechanical design on a humanoid platform. The teams next to mine train policies in simulation; the load cases I size brackets, fasteners, and bearings against feed off, or push back on, what those policies do in the real world. I wanted that loop to be something I had personally walked end-to-end, not just a hand-off across a wall.

So I set up Isaac Sim and Isaac Lab on my home box, picked the Unitree G1 (open URDF, well-supported in the stack), and trained a PPO velocity-tracking policy from scratch.

§ 03 / Approach

What I built.

Isaac Sim 4.5 + Isaac Lab + rsl_rl PPO, on a single RTX 3060 Ti. The 8 GB of VRAM set the parallelism budget at 1,024 envs (the default is 4,096) and decided how aggressive a curriculum I could run on this box.

The task is the stock Isaac-Velocity-Flat-G1-v0: 23-DOF G1, 56-dim observation, 23-dim relative joint-position targets, physics at 200 Hz. I ran the IsaacLab default reward shape rather than retuning it, then captured intermediate checkpoints every 50 iterations across a 1,500-iter run.

Alongside the training run I wrote a short memo on which Isaac Sim outputs are actually useful to a mechanical team (joint reaction wrenches at the physics step, applied vs. measured effort, contact data with a metadata sidecar) and what derating to apply before any of it touches motor sizing or FEA. The simulator gives you data; the engineer is the one who decides what it means.

§ 04 / Outcome

What shipped, and what hasn't.

The pipeline stands on its own, the 1,500-iteration run completed, and the four-milestone reel above is a real capture from that run. The gait is upright and tracking the velocity command by gen 1499, but it's a shuffle, not an athletic walk. Flat ground only; no rough-terrain curriculum, no quantitative gait metrics, and no export of the policy to a hardware target. That's the work for the next pass.

Isaac SimIsaac LabPPO · rsl_rlUnitree G1PyTorch · CUDA