Q: What could I do to enhance the lighting in the bedroom?
A: You could turn on the lamp next to the bed.
Q: Where did I place the vase in the dining room?
A: It's placed in the center of the dining table.
Q: I invited four friends over. Are there enough seats for everyone in the living room?
A: Yes, there are enough seats for everyone.
Q: How can I practice drumming?
A: You can use the drum set in the corner of the practice room.
Q: How many lab coats hanging in the laboratory?
A: There are two.
Q: What color is the front door in the entry hallway?
A: The front door is green.
Q: Is there a plant on the window sill in my workspace?
A: No, there isn't a plant.
Q: What exercise equipment can I use in the living room?
A: You can use the treadmill.
Q: Is the curtain in the living room drawn all the way?
A: No, the curtain is partially drawn.
Q: Where did I put the plates?
A: It's on a shelf in the storage room.
Q: What is the black appliance near the window in my kitchen?
A: It's a microwave.
Q: What is the composition of the kitchen floor's surface?
A: The kitchen floor is made up of tiles.
Embodied Question Answering (EQA) is a challenging task in embodied intelligence that requires agents to dynamically explore 3D environments, actively gather visual information, and perform multi-step reasoning to answer questions. However, current EQA approaches suffer from critical limitations in exploration efficiency, dataset design, and evaluation metrics. Moreover, existing datasets often introduce biases or prior knowledge, leading to disembodied reasoning, while frontier-based exploration strategies struggle in cluttered environments and fail to ensure fine-grained exploration of task-relevant areas. To address these challenges, we construct the EXPloration-awaRe Embodied queStion anSwering Benchmark (EXPRESS-Bench), the largest dataset designed specifically to evaluate both exploration and reasoning capabilities. EXPRESS-Bench consists of 777 exploration trajectories and 2,044 question-trajectory pairs. To improve exploration efficiency, we propose Fine-EQA, a hybrid exploration model that integrates frontier-based and goal-oriented navigation to guide agents toward task-relevant regions more effectively. Additionally, we introduce a novel evaluation metric, Exploration-Answer Consistency (EAC), which ensures faithful assessment by measuring the alignment between answer grounding and exploration reliability. Extensive experimental comparisons with state-of-the-art EQA models demonstrate the effectiveness of our EXPRESS-Bench in advancing embodied exploration and question reasoning.
@article{EXPRESSBench,
title={Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering},
author={Jiang, Kaixuan and Liu, Yang and Chen, Weixing and Luo, Jingzhou and Chen, Ziliang and Pan, Ling and Li, Guanbin and Lin, Liang},
year={2025}
journal={arXiv preprint arXiv:2503.11117}}