Robotics paper index
RMTL: Reinforced Micro-task Learning for Long-Horizon Manipulation with VLM Rewards
One-line summary
A robotics research paper on RMTL: Reinforced Micro-task Learning for Long-Horizon Manipulation with VLM Rewards.
Engineering notes
Engineering notes will be added by the Robot Papers editorial team.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为 VLA、具身智能、人形机器人控制、机器人操作等高价值论文补充中文说明。
Original abstract
Reinforcement learning (RL) for robotic manipulation often requires manually designing a dense reward function, which is difficult to tune and often fragile, or learning a reward from human demonstrations or preferences, which can be expensive. A recent line of work uses pretrained vision-language models (VLMs) as zero-shot reward models, replacing these costs with a single text prompt. However, we argue that a single global prompt is too coarse for long-horizon manipulation tasks with randomized initial conditions. The single-prompt VLM reward is near-flat for much of the trajectory, making early progress hard for the agent to detect. We propose Reinforced Micro-Task Learning (RMTL), an approach that decomposes a manipulation task into a small set of language-described micro-tasks and trains the agent to switch between them. At each step, the agent receives a multi-view VLM reward computed using the prompt of the currently active micro-task and averaged across multiple camera views to reduce the effect of view-specific occlusions. A reverse curriculum gradually exposes the agent to harder initial conditions, while a PPO worker is first trained with a fixed distance-based rule that selects the active micro-task. We then replace this rule with a learned hierarchical manager, turning rule-based phase selection into a fully learned hierarchical policy. We instantiate RMTL on the Fetch manipulation environment using three short stage-specific prompts and without additional prompt tuning. Experiments show that RMTL provides more informative reward signals than single-prompt VLM rewards, enabling faster learning. These results suggest that decomposing VLM rewards into micro-task-specific language prompts can substantially improve the scalability of language-guided reinforcement learning for robotic manipulation.
Links and sources
Need this topic turned into a technical roadmap?
Robot Papers can prepare a custom robotics literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments