Object detection and forecasting are fundamental components of embodied perception. These problems, however, are largely studied in isolation. We propose a joint detection, tracking, and multi-agent forecasting benchmark from sensor data. Although prior works have studied end-to-end perception, no large scale dataset or challenge exists to facilitate standardized evaluation for this problem. In addition, self-driving benchmarks have historically focused on evaluating a few common classes such as cars, pedestrians and bicycles, and neglect many rare classes in-the-tail. However, in the real open world, self-driving vehicles must still detect rare classes to ensure safe operation.
To this end, our proposed benchmark will be the first to evaluate end-to-end perception on 26 classes defined by the AV2 ontology. Specifically, we will repurpose the AV2 sensor dataset, which has track annotations for 26 object categories, for end-to-end perception: for each timestep in a given sensor sequence, algorithms will have access to all prior frames and must produce tracks for all past sensor sweeps, detections for the current timestep, and forecasted trajectories for the next 3 s. This challenge is different from the Motion Forecasting challenge because we do not provide ground truth tracks as input, requiring algorithms to process raw sensor data. Our primary evaluation metric is Forecasting Average Precision, a joint detection and forecasting metric that computes performance averaged over static, linear, and nonlinearly moving cohorts. Unlike standard motion forecasting evaluation, end-to-end perception must consider both true positive and false positive predictions.
The focus of this year’s challenge is to improve on long-range detection (e.g. 100m - 150m) and non-linear motion forecasting. Methods that demonstrate significant improvement will be highlighted at the Workshop on Autonomous Driving at CVPR 2024.
Please see the Argoverse User Guide for detailed instructions on how to download the sensor dataset.
Please see our baselines to get started.
Please see end-to-end perception submission tutorial for a guide on preparing your submission.
Citing
@INPROCEEDINGS {Argoverse2,
author = {Benjamin Wilson and William Qi and Tanmay Agarwal and John Lambert and Jagjeet Singh and Siddhesh Khandelwal and Bowen Pan and Ratnesh Kumar and Andrew Hartnett and Jhony Kaesemodel Pontes and Deva Ramanan and Peter Carr and James Hays},
title = {Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting},
booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021)},
year = {2021}
}
@INPROCEEDINGS {peri22futuredet,
author = {Neehar Peri, Jonathon Luiten, Mengtian Li, Aljoša Ošep, Laura Leal-Taixé, Deva Ramanan},
title = {Forecasting from LiDAR via Future Object Detection},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2022}
}
@INPROCEEDINGS {peri2022lt3d,
author = {Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong},
title = {Towards Long-Tailed 3D Detection},
booktitle = {Conference on Robot Learning},
year = {2022}
}