Shipping a Trillion Parameters: Delta Weight Sync in TRL

Amine Dirhoussi et al./Hugging Face Blog/2026-06

A novel technique for async reinforcement learning that reduces per-step model weight transfer from terabytes to megabytes by exploiting bf16 sparsity. Routes only changed weight elements through Hugging Face Buckets to inference servers, enabling fully disaggregated training across distributed machines.

Delta Weight Sync

Delta Weight Sync is a technique that exploits the sparsity of bf16 weight updates during reinforcement learning. Approximately 99% of weights remain unchanged between optimizer steps, so instead of shipping full model checkpoints (terabytes), the system identifies and routes only the changed weight elements (megabytes) through Hugging Face Buckets to inference servers. This enables fully disaggregated training across distributed machines without shared infrastructure.