[2] Data Engineering: Why and How to Convert LeRobot (Parquet/MP4) to HDF5
![[2] Data Engineering: Why and How to Convert LeRobot (Parquet/MP4) to HDF5](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fuploads%2Fcovers%2F69df4f9b74b22138755e755f%2F10d091b6-683c-4293-93d1-2eac404d553d.png&w=3840&q=75)
In my previous post, I explained why the JEPA architecture is such a promising lead for robotics. But between Yann LeCun’s theory and the first \(loss.backward()\), there is a massive wall: the data.
For my POC on the Koch arm (SO-ARM101), I’m using the LeRobot ecosystem. It’s a goldmine of data, but its default storage format isn't built for the intensive training cycles required by World Models. Here is why I had to build a technical "bridge" to the HDF5 format.
1. The Format War: Storage vs. Training
Feature | LeRobot Format (Parquet + MP4) | HDF5 Format (LeWM Optimized) |
Ideal Use | Lightweight distribution and archiving. | Intensive Training (GPU Datasets). |
Pros | Highly compressed, easy to visualize, HF standard. | Ultra-fast Random Access to frames and actions. |
Cons | Heavy CPU video decoding for every batch, potential desync. | Large file size, less "standard" for sharing. |
The Problem: Training a World Model (JEPA) requires sampling random time windows across thousands of episodes. Attempting a seek in an MP4 file for every single frame in a batch of 256 is performance suicide. HDF5 allows us to treat the dataset as one massive tensor living on the disk.
2. The "Little Mac" Challenge: Optimize or Die
Fetching datasets via the lerobot library is a breeze. The real challenge began during conversion. My Mac, with its limited resources, suffered several Kernel Panics before I got it right.
To succeed without saturating the RAM, I had to implement a "Lean & Mean" pipeline:
The "Low-Memory" Conversion Strategy
Linear Pipeline: I abandoned aggressive parallelism. We process one episode at a time, one camera at a time. It’s slower, but it’s predictable.
Micro-Batching (64 frames): Instead of loading an entire episode, we decode and write to the HDF5 in small chunks.
Video Streaming: Using iterators (PyAV/OpenCV) to ensure a full video never materializes in the RAM.
LZF Compression: The perfect compromise. It's ultra-fast, CPU-light, and significantly reduces the final file weight.
Safety and Reliability (Diagnostic Mode)
Because a 2-hour conversion crashing at 99% is unacceptable, I integrated several safeguards:
Pre-validation: We check the integrity of metadata (episodes, frames, flags) before touching the video files.
Watchdog & Heartbeat: If the script shows no progress for 120 seconds, it "fails-fast" rather than wasting heat.
RAM Estimation: The script calculates and displays the estimated memory footprint of the batch before starting.
3. Alignment for LeWM (World Model)
The LeWM model is demanding. Conversion isn't enough; we need adaptation:
Resize 224x224: The standard for modern vision backbones. Resizing is done on-the-fly during conversion.
Key Normalization: LeRobot names columns one way, while LeWM expects another (e.g.,
pixels,action,state,done). My bridge handles the translation automatically.
4. The Golden Rule: No "Dirty" Data In low-cost robotics, datasets are often imperfect—truncated episodes or missing done flags are common.
My Policy: dirty_episode_policy=fail
On small datasets, the model is extremely sensitive to overfitting. Introducing inconsistent trajectories or ill-defined episode endings condemns the model to learn nonsense. I would rather have a converter that refuses to work than one that produces toxic data.
Tanguy's Advice
If you attempt this: watch your HDF5 chunks. A mismatch between your chunk size and your conversion micro-batches can turn your hard drive into a massive bottleneck. Yes, I learned this the hard way—remember, I only have a little Mac!
Next Step: We launch the training on an RTX 4090 runpod instance and see if our latent space survives physical reality.


![[3] Post-Mortem Analysis: Why My First World Model (JEPA) Is "Blind"](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fuploads%2Fcovers%2F69df4f9b74b22138755e755f%2Fa4b4eeac-f7b5-4d61-8f87-fa232ca80754.png&w=3840&q=75)
![[1] Rethinking Robotics: Why I’m Betting on JEPA over VLAs](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fuploads%2Fcovers%2F69df4f9b74b22138755e755f%2F2cad8c2b-ca91-4b64-be98-421b6c3c64bf.png&w=3840&q=75)