My day-job is mechanical design on a humanoid platform. The teams next to mine train policies in simulation; the load cases I size brackets, fasteners, and bearings against feed off — or push back on — what those policies do in the real world. I wanted that loop to be something I had personally walked end-to-end, not just a hand-off across a wall.
So I set up Isaac Sim and Isaac Lab on my home box, picked the Unitree G1 (open URDF, well-supported in the stack), and trained a PPO velocity-tracking policy from scratch.
Isaac Sim 4.5 + Isaac Lab + rsl_rl PPO, on a single RTX 3060 Ti. The 8 GB of VRAM set the parallelism budget at 1,024 envs (the default is 4,096) and decided how aggressive a curriculum I could run on this box.
The task is the stock Isaac-Velocity-Flat-G1-v0 — 23-DOF G1, 56-dim observation, 23-dim relative joint-position targets, physics at 200 Hz. I ran the IsaacLab default reward shape rather than retuning it, then captured intermediate checkpoints every 50 iterations across a 1,500-iter run.
Alongside the training run I wrote a short memo on which Isaac Sim outputs are actually useful to a mechanical team — joint reaction wrenches at the physics step, applied vs. measured effort, contact data with a metadata sidecar — and what derating to apply before any of it touches motor sizing or FEA. The simulator gives you data; the engineer is the one who decides what it means.
The pipeline stands on its own, the 1,500-iteration run completed, and the four-milestone reel above is a real capture from that run. The gait is upright and tracking the velocity command by gen 1499 — but it's a shuffle, not an athletic walk. Flat ground only; no rough-terrain curriculum, no quantitative gait metrics, and no export of the policy to a hardware target. That's the work for the next pass.