HOI-Synth Benchmark

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

We investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection. Via extensive experiments and comparative analyses on three egocentric datasets, VISOR , EgoHOS, and ENIGMA-51, our findings reveal how to exploit synthetic data for the HOI detection task when real labeled data are scarce or unavailable. Specifically, by leveraging only 10% of real labeled data, we achieve improvements in Overall AP compared to baselines trained exclusively on real data of: +5.67% on EPIC-KITCHENS VISOR, +8.24% on EgoHOS, and +11.69% on ENIGMA-51. Our analysis is supported by a novel data generation pipeline and the newly introduced HOI-Synth benchmark which augments existing datasets with synthetic images of hand-object interactions automatically labeled with hand-object contact states, bounding boxes, and pixel-wise segmentation masks.

GitHub Paper

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

GitHub Paper

Data Generation Pipeline

Our pipeline relies on state-of-the-art datasets and components to enable an accurate generation of egocentric images of hand-object interactions. We first select a random hand-object grasp from the DexGraspNet dataset, which is fit to a randomly generated human model and integrated with the appropriate object mesh specified in the hand-object grasp. We then select a random environment from the HM3D dataset and place the human-object model in the environment. We finally place a virtual camera at human eye level to capture the scene from the first-person point of view.

HOI-Synth Benchmark

The HOI-Synth benchmark extends three established datasets of egocentric images designed to study hand-object interaction detection, EPIC-KITCHENS VISOR, EgoHOS, and ENIGMA-51, with automatically labeled synthetic data obtained through the proposed HOI generation pipeline.

RGB
Images

75460

Hand
annotations

141778

Object
annotations

101525

Interaction
frames

101625

Download

Data

Frames

Download

Data

Annotations

Download

Baselines

GitHub

Pipeline

Simulator

GitHub

Paper

Leonardi, R., Furnari, A., Ragusa, F., & Farinella, G. M. (2023). Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? An Investigation and the HOI-Synth Domain Adaptation Benchmark. arXiv preprint arXiv:2312.02672. Cite our paper: ArXiv.
[01/07/2024] Accepted at European Conference on Computer Vision (ECCV) 2024!

@inproceedings{leonardi2025synthetic,
  title={Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?},
  author={Leonardi, Rosario and Furnari, Antonino and Ragusa, Francesco and Farinella, Giovanni Maria},
  booktitle={European Conference on Computer Vision},
  pages={36--54},
  year={2025},
  organization={Springer}
}

Visit our page dedicated to First Person Vision Research for other related publications.

People

Rosario
Leonardi

FPV@IPLAB
Next Vision s.r.l.

Antonino
Furnari

FPV@IPLAB
Next Vision s.r.l.

Francesco
Ragusa

FPV@IPLAB
Next Vision s.r.l.

Giovanni Maria
Farinella

FPV@IPLAB
Next Vision s.r.l.

This research has been supported by the project Future Artificial Intelligence Research (FAIR) – PNRR MUR Cod. PE0000013 - CUP: E63C22001940006.

This research has been partially supported by the project EXTRA-EYE - PRIN 2022 - CUP E53D23008280006 - Finanziato dall’Unione Europea - Next Generation EU.

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

Data Generation Pipeline

HOI-Synth Benchmark

RGB Images

75460

Hand annotations

141778

Object annotations

101525

Interaction frames

101625

Download

Frames

Annotations

Baselines

Simulator

Paper

People

Rosario Leonardi

Antonino Furnari

Francesco Ragusa

Giovanni Maria Farinella

RGB
Images

Hand
annotations

Object
annotations

Interaction
frames

Rosario
Leonardi

Antonino
Furnari

Francesco
Ragusa

Giovanni Maria
Farinella