Abstract
Multi-agent collaboration enhances the perception capabilities of individual agents through information sharing.
However, in real-world applications, differences in sensors and models across heterogeneous agents inevitably lead to domain gaps during collaboration.
Existing approaches based on adaptation and reconstruction fail to support \textit{pragmatic heterogeneous collaboration} due to two key limitations:
(1) Intrusive retraining of the encoder or core modules disrupts the established semantic consistency among agents; and
(2) accommodating new agents incurs high computational costs, limiting scalability. To address these challenges,
we present a novel Generative Communication mechanism (GenComm) that facilitates seamless perception across heterogeneous multi-agent systems through feature generation,
without altering the original network, and employs lightweight numerical alignment of spatial information to efficiently integrate new agents at minimal cost. Specifically,
a tailored Deformable Message Extractor is designed to extract spatial information for each collaborator, which is then transmitted in place of intermediate features.
The Spatial-Aware Feature Generator, utilizing a conditional diffusion model,
generates features aligned with the ego agent's semantic space while preserving the spatial information of the collaborators. These generated features are further refined by a Channel Enhancer before fusion.
Experiments conducted on the OPV2V-H, DAIR-V2X and V2X-Real datasets demonstrate that GenComm outperforms existing state-of-the-art methods, achieving an 81\% reduction in both computational cost and parameter count when incorporating new agents.