PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation
PoET is a transformer-based framework that takes a single RGB-image as input to simultaneously estimate the 6D pose, namely translation and rotation, for every object present in the image. It takes the detections and feature maps of an object detector backbone and feeds this additional information into an attention-based transformer. Our framework can be trained on top of any object detector framework. Any additional information that is not contained in the raw RGB image, e.g. depth maps or 3D models, is not required. We achieve state-of-the-art-results on challenging 6D object pose estimation datasets. Moreover, PoET can be utilized as a pose sensor in 6D localization tasks.
For more details we kindly refer you to our [publication] and the project page on our [Github]. If you find our work helpful, please consider citing us.
Citation:
Thomas Jantos, Mohamed Amin Hamdad, Wolfgang Granig, Stephan Weiss and Jan Steinbrener: PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation. 6th Annual Conference on Robot Learning (CoRL 2022), Auckland, 2022
Pre-Trained Models
We provide the pre-trained object detector backbones as well as fully trained PoET models (including object detector backbone weights) and the corresponding hyperparameter configurations.
Description | Dataset | Downloads |
---|---|---|
PoET with a Scaled-YOLOv4 Object Detector Backbone | YCB-V | PoET Hyperparameters YOLO Backbone |
PoET with a Mask R-CNN Object Detector Backbone | YCB-V | PoET Hyperparameters Mask R-CNN Backbone |
PoET with a Mask R-CNN Object Detector Backbone | LM-O | PoET Hyperparameters Mask R-CNN Backbone |
Quicklinks
Information for
Address
Universitätsstraße 65-67
9020 Klagenfurt am Wörthersee
Austria
+43 463 2700
uni [at] aau [dot] at
www.aau.at
Campus Plan