Robotics paper index

SSI-Policy: Learning Structured Scene Interfaces for Vision-Language Robotic Manipulation

2026-06-25 · arXiv: 2606.26800

One-line summary

A robotics research paper on SSI-Policy: Learning Structured Scene Interfaces for Vision-Language Robotic Manipulation.

Engineering notes

Engineering notes will be added by the Robot Papers editorial team.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为 VLA、具身智能、人形机器人控制、机器人操作等高价值论文补充中文说明。

Original abstract

Real-world robotic manipulation demands spatial grounding, task-aware reasoning, and precise control. Learning such capabilities becomes particularly challenging in the low-data regime. Prior methods often trade off scalable task-level reasoning and explicit physical structure: video-based approaches can drift geometrically over long horizons, 3D approaches often require depth sensing, and many flow/trajectory interfaces emphasize motion without an explicit RGB-only geometric representation. We introduce SSI-Policy, a modular framework built around a Structured Scene Interface (SSI) -- a unified, RGB-only intermediate representation that jointly encodes monocular depth features, language-grounded object layouts, and instruction-conditioned 2D motion trajectories. Critically, SSI is robot-agnostic and trainable from action-free video, decoupling perception from control so that the downstream policy can learn from few demonstrations. On the LIBERO benchmark with only 10 demonstrations per task, SSI-Policy improves over the strongest prior method by nearly 15\% and remains competitive with 50-demo methods that leverage large-scale external pretraining. Ablations show that geometric and motion cues provide complementary benefits within the shared interface. We further validate on 13 real-world tasks spanning spatial reasoning, cross-embodiment transfer, and contact-rich manipulation.

5.0Engineering value
7.0Research novelty
4.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Robot Papers can prepare a custom robotics literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment