Background Image

MMSP 2025 Embodied AI Challenge:
Long-horizon Vision-Language Navigation Challenge

September 21 to September 23, 2025
Beijing, China

The 1st Long-horizon Vision-Language Navigation Challenge based on the insights presented in the LH-VLN—hosted as the “Embodied AI Challenge” track of the IEEE 27th International Workshop on Multimedia Signal Processing (MMSP 2025)—focuses on complicated long-horizon VLN tasks. Our LHPR-VLN benchmark defines a complex task that includes multiple single-stage subtasks. For an LHPR-VLN task, the basic format is “Find something somewhere, and take it to something somewhere, then. . . ”. Each complex task involves locating an object at a specified initial location and transporting it to a designated target location, potentially encompassing two to four sequential navigation subtasks. The embodied agent needs to sequentially complete these single-stage navigation tasks to ultimately fulfill the instruction. These tasks emphasizes long-term planning and decision consistency across consecutive subtasks. The goal is to push agents beyond simple, short-term navigation by requiring them to deeply comprehend complex task instructions, maintain continuous navigation, and handle sequential sub-tasks seamlessly across a dynamic environment.
Background Image

Figure 1: Environment where agents execute navigation tasks.

Challenge

Background Image

Video 1: Agent executing the LH-VLN task.

Benchmark: LHPR-VLN.
The tasks within this benchmark all consist of multiple single-stage subtasks. Throughout navigation, the agent acquires observational data from three perspectives (+60° , 0° , −60° ) and is permitted to execute fundamental actions: turn left, move forward, turn right, and stop. When the agent selects the “stop” action, the sub-task is deemed complete, and task success is evaluated based on the agent’s final positional state relative to the target.
For each single-stage navigation task, the agent must approach within a 1-meter geodesic distance of the target object, ensuring the object is positioned within a 60-degree horizontal field of view to maintain task fidelity.

Schedule details

Task Overview

The 1st MMSP Workshp focuses on complicated long‑horizon VLN tasks. LHPR-VLN benchmark defines a complex task that includes multiple single-stage subtasks. For an LHPR-VLN task, the basic format is “Find something somewhere, and take it to something somewhere, then. . . ”. Each complex task involves locating an object at a specified initial location and transporting it to a designated target location, potentially encompassing two to four sequential navigation subtasks. The embodied agent needs to sequentially complete these single-stage navigation tasks to ultimately fulfill the instruction. These tasks emphasizes long-term planning and decision consistency across consecutive subtasks. The goal is to push agents beyond simple, short-term navigation by requiring them to deeply comprehend complex task instructions, maintain continuous navigation, and handle sequential sub-tasks seamlessly across a dynamic environment.

Submission evaluation

The competition evaluation consists of two stages. In the first stage, we will release the training data along with some test data. Participants will train their models on the training data and self-assess the results. These results will serve as the basis for the ranking in the first stage. We require the participants advancing to the second stage to publicly release all model code and weights, and submit a corresponding technical report as part of the entry. The technical report will account for 30% of the final evaluation for awarding.

In the second stage, participants who are selected in the first stage will be required to submit a container that meets the requirements, which will then be tested by the competition organizers on an undisclosed test set, and the final ranking will be based on these results.

Detailed requirements are as follows:

Timeline

coming soon...

Registration

coming soon...

Challenge guide

coming soon...

Organizers

Yang Liu
Associate professor at SYSU
Liang Lin
Professor at SYSU
Weixing Chen
PhD Student at SYSU
Xinshuai Song
MSc Student at SYSU
Kaixuan Jiang
MSc Student at SYSU
Yexin Zhang
Undergraduate at SYSU