HiPolicy

Hierarchical Multi-Frequency Action Chunking
for Policy Learning

1 CFCS, School of CS, Peking University 2 National Key Lab for Multimedia Info Processing, PKU 3 Xi'an Jiaotong University 4 Tsinghua University 5 Beihang University 6 JD Technology

Abstract

Robotic imitation learning faces a fundamental trade-off between modeling long-horizon dependencies and enabling fine-grained closed-loop control. Existing fixed-frequency action chunking approaches struggle to achieve both. Building on this insight, we propose HiPolicy, a hierarchical multi-frequency action chunking framework that jointly predicts action sequences at different frequencies to capture both coarse high-level plans and precise reactive motions. We extract and fuse hierarchical features from history observations aligned to each frequency for multi-frequency chunk generation, and introduce an entropy-guided execution mechanism that adaptively balances long-horizon planning with fine-grained control based on action uncertainty. Experiments on diverse simulated benchmarks and real-world manipulation tasks show that HiPolicy can be seamlessly integrated into existing 2D and 3D generative policies, delivering consistent improvements in performance while significantly enhancing execution efficiency.

Manipulation Imitation Learning Hierarchical Policy Action Chunking Diffusion Policy
HiPolicy overview teaser Comparison between fixed-frequency chunking and HiPolicy
Existing imitation learning methods typically predict an action chunk at a single fixed frequency, leading to a trade-off between long-horizon dependency modeling and precise closed-loop control. In contrast, HiPolicy proposes hierarchical multi-frequency action chunking, enabling the capture of both long-term intentions and precise closed-loop adjustments. Additionally, the proposed action-entropy guided adaptive execution mechanism selects action frequency based on action uncertainty, balancing robustness and efficiency during task execution.

Method

HiPolicy architecture overview
Overview of HiPolicy architecture. We propose HiPolicy, a hierarchical multi-frequency action chunk policy with an entropy-guided adaptive execution strategy. Given a hierarchical observation history, HiPolicy predicts multi-frequency action chunks simultaneously through a diffusion-based model. During inference, HiPolicy estimates the action entropy through multiple samplings and adaptively chooses the execution frequency according to the estimated entropy.

Hierarchical Multi-Frequency Action Chunking

HiPolicy's core innovation is the hierarchical multi-frequency action chunking architecture that jointly predicts action sequences at multiple frequencies:

HF
High-frequency branch

Generates short, fine-grained action chunks for precise reactive control and closed-loop adjustments.

LF
Low-frequency branch

Produces longer action sequences capturing high-level goals, stage transitions, and long-horizon intent.

Both branches are trained jointly with a combined loss, enabling the policy to learn complementary representations that capture both the "what" and the "how" of manipulation.

Entropy-Guided Adaptive Execution

Another key innovation is the confidence-aware execution mechanism that dynamically selects which frequency to execute based on the policy's action distribution entropy:

Entropy-guided adaptive execution
Entropy-guided adaptive execution mechanism. The policy dynamically selects high-frequency or low-frequency action chunks based on action distribution entropy.
LE
Low entropy (concentrated distribution)

Stable predictions — execute high-frequency chunks for fine-grained and closed-loop control.

HE
High entropy (dispersed distribution)

High uncertainty, indicating a phase transition — execute low-frequency chunks for broader planning, while increasing execution speed.

This adaptive mechanism ensures the robot acts precisely when confident and plans more broadly when uncertain, naturally balancing reactivity and deliberation without manual tuning.

Simulation Results

Overall Performance Gain in RoboTwin1.0 & 2.0

  • High-precision tasks: +105% relative improvement over DP baseline
  • Easy tasks: +34% relative improvement over DP baseline
  • Total average success rate: DP 37 → 60, DP3 41 → 59

100 evaluation episodes per task.

Module Contribution (Ablation)

  • Full HiPolicy average: 60
  • w/o hierarchical freq. structure: 37
  • w/o feature fusion: 54
  • Low-only / High-only condition: 58 / 49

Hierarchical structure is the core contributor; fusion and multi-frequency conditioning both provide additive gains.

Entropy-Guided Execution Effect

  • Average success rate: 24 (DP) → 41 (w/ EG), +71%
  • Average execution steps: 133 (DP) → 100 (w/ EG), ~25% fewer steps
  • Improves both reliability and execution efficiency

Test on 5 tasks from RoboTwin 2.0).

Experiments

Real-world manipulation tasks
Stack Cube

Eight real-world manipulation tasks: stacking cubes, sweeping a board, packing packages, placing vegetables, placing bowls, opening/closing microwave doors, and pressing toaster buttons.

Citation

@article{zhang2026hipolicy,
  title={HiPolicy: Hierarchical Multi-Frequency Action Chunking for Policy Learning},
  author={Zhang, Jiyao and Han, Zimu and Wang, Junhan and Wu, Xionghao and Lin, Shihong and Li, Jinzhou and Fan, Hongwei and Wu, Ruihai and Li, Dongjiang and Dong, Hao},
  journal={arXiv preprint arXiv:2604.06067},
  year={2026}
}