HiPolicy: Hierarchical Multi-Frequency Action Chunking for Policy Learning

Abstract

Robotic imitation learning faces a fundamental trade-off between modeling long-horizon dependencies and enabling fine-grained closed-loop control. Existing fixed-frequency action chunking approaches struggle to achieve both. Building on this insight, we propose HiPolicy, a hierarchical multi-frequency action chunking framework that jointly predicts action sequences at different frequencies to capture both coarse high-level plans and precise reactive motions. We extract and fuse hierarchical features from history observations aligned to each frequency for multi-frequency chunk generation, and introduce an entropy-guided execution mechanism that adaptively balances long-horizon planning with fine-grained control based on action uncertainty. Experiments on diverse simulated benchmarks and real-world manipulation tasks show that HiPolicy can be seamlessly integrated into existing 2D and 3D generative policies, delivering consistent improvements in performance while significantly enhancing execution efficiency.

Manipulation Imitation Learning Hierarchical Policy Action Chunking Diffusion Policy

HiPolicy overview teaser — Existing imitation learning methods typically predict an action chunk at a single fixed frequency, leading to a trade-off between long-horizon dependency modeling and precise closed-loop control. In contrast, HiPolicy proposes hierarchical multi-frequency action chunking, enabling the capture of both long-term intentions and precise closed-loop adjustments. Additionally, the proposed action-entropy guided adaptive execution mechanism selects action frequency based on action uncertainty, balancing robustness and efficiency during task execution.

Comparison between fixed-frequency chunking and HiPolicy — Existing imitation learning methods typically predict an action chunk at a single fixed frequency, leading to a trade-off between long-horizon dependency modeling and precise closed-loop control. In contrast, HiPolicy proposes hierarchical multi-frequency action chunking, enabling the capture of both long-term intentions and precise closed-loop adjustments. Additionally, the proposed action-entropy guided adaptive execution mechanism selects action frequency based on action uncertainty, balancing robustness and efficiency during task execution.

Method

HiPolicy architecture overview — Overview of HiPolicy architecture. We propose HiPolicy, a hierarchical multi-frequency action chunk policy with an entropy-guided adaptive execution strategy. Given a hierarchical observation history, HiPolicy predicts multi-frequency action chunks simultaneously through a diffusion-based model. During inference, HiPolicy estimates the action entropy through multiple samplings and adaptively chooses the execution frequency according to the estimated entropy.

Hierarchical Multi-Frequency Action Chunking

HiPolicy's core innovation is the hierarchical multi-frequency action chunking architecture that jointly predicts action sequences at multiple frequencies:

High-frequency branch

Generates short, fine-grained action chunks for precise reactive control and closed-loop adjustments.

Low-frequency branch

Produces longer action sequences capturing high-level goals, stage transitions, and long-horizon intent.

Both branches are trained jointly with a combined loss, enabling the policy to learn complementary representations that capture both the "what" and the "how" of manipulation.

Entropy-Guided Adaptive Execution

Another key innovation is the confidence-aware execution mechanism that dynamically selects which frequency to execute based on the policy's action distribution entropy:

Low entropy (concentrated distribution)

Stable predictions — execute high-frequency chunks for fine-grained and closed-loop control.

High entropy (dispersed distribution)

High uncertainty, indicating a phase transition — execute low-frequency chunks for broader planning, while increasing execution speed.

This adaptive mechanism ensures the robot acts precisely when confident and plans more broadly when uncertain, naturally balancing reactivity and deliberation without manual tuning.

Simulation Results

Overall Performance Gain in RoboTwin1.0 & 2.0

High-precision tasks: +105% relative improvement over DP baseline
Easy tasks: +34% relative improvement over DP baseline
Total average success rate: DP 37 → 60, DP3 41 → 59

100 evaluation episodes per task.

Module Contribution (Ablation)

Full HiPolicy average: 60
w/o hierarchical freq. structure: 37
w/o feature fusion: 54
Low-only / High-only condition: 58 / 49

Hierarchical structure is the core contributor; fusion and multi-frequency conditioning both provide additive gains.

Entropy-Guided Execution Effect

Average success rate: 24 (DP) → 41 (w/ EG), +71%
Average execution steps: 133 (DP) → 100 (w/ EG), ~25% fewer steps
Improves both reliability and execution efficiency

Test on 5 tasks from RoboTwin 2.0).