3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians

Zeming Wei1*, Junyi Lin1*, Yang Liu1,3†, Weixing Chen1, Jingzhou Luo1, Guanbin Li1,2,3, Liang Lin1,2,3
1Sun Yat-sen University 2Peng Cheng Laboratory 3Guangdong Key Laboratory of Big Data Analysis and Processing
*Equal contribution Corresponding Author

Abstract

3D affordance reasoning plays a critical role in associating human instructions with the functional regions of 3D objects, facilitating precise, task-oriented manipulations in embodied AI. However, current methods, which predominantly depend on sparse 3D point clouds, exhibit limited generalizability and robustness due to their sensitivity to coordinate variations and the inherent sparsity of the data. By contrast, 3D Gaussian Splatting (3DGS) delivers high-fidelity, real-time rendering with minimal computational overhead by representing scenes as dense, continuous distributions. This positions 3DGS as a highly effective approach for capturing fine-grained affordance details and improving recognition accuracy. Nevertheless, its full potential remains largely untapped due to the absence of large-scale, 3DGS-specific affordance datasets. To overcome these limitations, we present 3DAffordSplat, the first large-scale, multi-modal dataset tailored for 3DGS-based affordance reasoning. This dataset includes 23k Gaussian instances, 8.3k point cloud instances, and 6.6k manually annotated affordance labels, encompassing 21 object categories and 18 affordance types. Building upon this dataset, we introduce AffordSplatNet, a novel model specifically designed for affordance reasoning using 3DGS representations. AffordSplatNet features an innovative cross-modal structure alignment module that exploits structural consistency priors to align 3D point cloud and 3DGS representations, resulting in enhanced affordance recognition accuracy. Extensive experiments demonstrate that the 3DAffordSplat dataset significantly advances affordance learning within the 3DGS domain, while AffordSplatNet consistently outperforms existing methods across both seen and unseen settings, highlighting its robust generalization capabilities.

Dataset: 3DAffordSplat

MY ALT TEXT

Comparison of our 3DAffordSplat with other 3D affordance datasets.

MY ALT TEXT Dataset overview. (a) Category distribution in 3DAffordSplat. (b) Numbers of 3DGS annotations in each affordance category. (c) Representative data examples from 3DAffordSplat (3DGS and point cloud, with affordance annotations and questions), the colored region in point clouds and 3DGS is the affordance annotation. (d) Examples of affordance reasoning.
MY ALT TEXT

Construction Pipeline. 3DAffordSplat dataset integrates data from LASO and ShapeSplat. The point cloud and textual data are sourced from LASO, while the 3D Gaussian data is derived from ShapeSplat. According to the standard of 3DAffordanceNet, we manually labeled a small part of the Gaussian datas. Each object instance includes three modalities: point cloud, 3D Gaussian, and text, supporting applications like prediction, embodied question answering, and interactive grasping.

MY ALT TEXT
Detailed statistics of 3DAffordSplat. The dataset includes 23,672 Gaussian instances, 8,231 point cloud instances, and 6,631 manually annotated affordance labels, encompassing 21 object categories and 18 affordance types.

Model

MY ALT TEXT Architecture Overview. AffordSplatNet (a) processes 3D Gaussians and human instructions through a hierarchical pipeline. It extracts multi-granularity features from Gaussians, while a pre-trained language model infers an ⟨Aff⟩ token from the text query, representing an intermediate segmentation result. These modalities are fused through attention mechanisms, with granularity selection prioritizing task-relevant spatial scales. The selected features decode into dynamic kernels for efficient affordance mask generation. To enhance 3D structural learning, Cross-Modal Structure Alignment (CMSA) (b) module aligns the Affordance regions and overall structural relations between the Gaussian and point cloud data at the structural level.

Experiments

MY ALT TEXT We compare the performance of AffordSplatNet with state-of-the-art point cloud models on 3DAffordSplat. AffordSplatNet outperforms other models in 3DGS Affordance Reasoning.

Results Visualization

MY ALT TEXT MY ALT TEXT
Visualization Results of AffordSplatNet. Each example includes one query, one answer and four object shapes, illustrating the model’s generalization capability in affordance knowledge. The identified affordance regions are marked in red.

Statement And Contact

This project is for research purpose only, please contact us for the licence of commercial use. For any other questions please contact (weizm6@mail2.sysu.edu.cn, linjy279@mail2.sysu.edu.cn or liuy856@mail.sysu.edu.cn).

BibTeX

@misc{wei20253daffordsplatefficientaffordancereasoning,
        title={3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians}, 
        author={Zeming wei and Junyi Lin and Yang Liu and Weixing Chen and Jingzhou Luo and Guanbin Li and Liang Lin},
        year={2025},
        eprint={2504.11218},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2504.11218}, 
  }