Welcome to MMDetection3D’s documentation!¶
Get Started¶
Prerequisites¶
In this section, we demonstrate how to prepare an environment with PyTorch.
MMDetection3D works on Linux, Windows (experimental support) and macOS. It requires Python 3.7+, CUDA 9.2+, and PyTorch 1.6+.
Note
If you are experienced with PyTorch and have already installed it, just skip this part and jump to the next section. Otherwise, you can follow these steps for the preparation.
Step 0. Download and install Miniconda from the official website.
Step 1. Create a conda environment and activate it.
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
Step 2. Install PyTorch following official instructions, e.g.
On GPU platforms:
conda install pytorch torchvision -c pytorch
On CPU platforms:
conda install pytorch torchvision cpuonly -c pytorch
Installation¶
We recommend that users follow our best practices to install MMDetection3D. However, the whole process is highly customizable. See Customize Installation section for more information.
Best Practices¶
Step 0. Install MMEngine, MMCV and MMDetection using MIM.
pip install -U openmim
mim install mmengine
mim install 'mmcv>=2.0.0rc4'
mim install 'mmdet>=3.0.0'
Note: In MMCV-v2.x, mmcv-full
is renamed to mmcv
, if you want to install mmcv
without CUDA ops, you can use mim install "mmcv-lite>=2.0.0rc4"
to install the lite version.
Step 1. Install MMDetection3D.
Case a: If you develop and run mmdet3d directly, install it from source:
git clone https://github.com/open-mmlab/mmdetection3d.git -b dev-1.x
# "-b dev-1.x" means checkout to the `dev-1.x` branch.
cd mmdetection3d
pip install -v -e .
# "-v" means verbose, or more output
# "-e" means installing a project in edtiable mode,
# thus any local modifications made to the code will take effect without reinstallation.
Case b: If you use mmdet3d as a dependency or third-party package, install it with MIM:
mim install "mmdet3d>=1.1.0rc0"
Note:
If you would like to use
opencv-python-headless
instead ofopencv-python
, you can install it before installing MMCV.Some dependencies are optional. Simply running
pip install -v -e .
will only install the minimum runtime requirements. To use optional dependencies likealbumentations
andimagecorruptions
either install them manually withpip install -r requirements/optional.txt
or specify desired extras when callingpip
(e.g.pip install -v -e .[optional]
). Valid keys for the extras field are:all
,tests
,build
, andoptional
.We have supported
spconv 2.0
. If the user has installedspconv 2.0
, the code will usespconv 2.0
first, which will take up less GPU memory than using the defaultmmcv spconv
. Users can use the following commands to installspconv 2.0
:pip install cumm-cuxxx pip install spconv-cuxxx
Where
xxx
is the CUDA version in the environment.For example, using CUDA 10.2, the command will be
pip install cumm-cu102 && pip install spconv-cu102
.Supported CUDA versions include 10.2, 11.1, 11.3, and 11.4. Users can also install it by building from the source. For more details please refer to spconv v2.x.
We also support
Minkowski Engine
as a sparse convolution backend. If necessary please follow original installation guide or usepip
to install it:conda install openblas-devel -c anaconda pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps --install-option="--blas_include_dirs=/opt/conda/include" --install-option="--blas=openblas"
We also support
Torchsparse
as a sparse convolution backend. If necessary please follow original installation guide or usepip
to install it:sudo apt-get install libsparsehash-dev pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0
or omit sudo install by following command:
conda install -c bioconda sparsehash export CPLUS_INCLUDE_PATH=CPLUS_INCLUDE_PATH:${YOUR_CONDA_ENVS_DIR}/include pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0
The code can not be built for CPU only environment (where CUDA isn’t available) for now.
Verify the Installation¶
To verify whether MMDetection3D is installed correctly, we provide some sample codes to run an inference demo.
Step 1. We need to download config and checkpoint files.
mim download mmdet3d --config pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car --dest .
The downloading will take several seconds or more, depending on your network environment. When it is done, you will find two files pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py
and hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth
in your current folder.
Step 2. Verify the inference demo.
Case a: If you install MMDetection3D from source, just run the following command.
python demo/pcd_demo.py demo/data/kitti/000008.bin pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth --show
You will see a visualizer interface with point cloud, where bounding boxes are plotted on cars.
Note:
If you want to input a .ply
file, you can use the following function and convert it to .bin
format. Then you can use the converted .bin
file to run demo.
Note that you need to install pandas
and plyfile
before using this script. This function can also be used for data preprocessing for training ply data
.
import numpy as np
import pandas as pd
from plyfile import PlyData
def convert_ply(input_path, output_path):
plydata = PlyData.read(input_path) # read file
data = plydata.elements[0].data # read data
data_pd = pd.DataFrame(data) # convert to DataFrame
data_np = np.zeros(data_pd.shape, dtype=np.float) # initialize array to store data
property_names = data[0].dtype.names # read names of properties
for i, name in enumerate(
property_names): # read data by property
data_np[:, i] = data_pd[name]
data_np.astype(np.float32).tofile(output_path)
Examples:
convert_ply('./test.ply', './test.bin')
If you have point clouds in other format (.off
, .obj
, etc.), you can use trimesh
to convert them into .ply
.
import trimesh
def to_ply(input_path, output_path, original_type):
mesh = trimesh.load(input_path, file_type=original_type) # read file
mesh.export(output_path, file_type='ply') # convert to ply
Examples:
to_ply('./test.obj', './test.ply', 'obj')
Case b: If you install MMDetection3D with MIM, open your python interpreter and copy&paste the following codes.
from mmdet3d.apis import init_model, inference_detector
config_file = 'pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py'
checkpoint_file = 'hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth'
model = init_model(config_file, checkpoint_file)
inference_detector(model, 'demo/data/kitti/000008.bin')
You will see a list of Det3DDataSample
, and the predictions are in the pred_instances_3d
, indicating the detected bounding boxes, labels, and scores.
Customize Installation¶
CUDA Versions¶
When installing PyTorch, you need to specify the version of CUDA. If you are not clear on which to choose, follow our recommendations:
For Ampere-based NVIDIA GPUs, such as GeForce 30 series and NVIDIA A100, CUDA 11 is a must.
For older NVIDIA GPUs, CUDA 11 is backward compatible, but CUDA 10.2 offers better compatibility and is more lightweight.
Please make sure the GPU driver satisfies the minimum version requirements. See this table for more information.
Note
Installing CUDA runtime libraries is enough if you follow our best practices, because no CUDA code will be compiled locally. However if you hope to compile MMCV from source or develop other CUDA operators, you need to install the complete CUDA toolkit from NVIDIA’s website, and its version should match the CUDA version of PyTorch. i.e., the specified version of cudatoolkit in conda install
command.
Install MMEngine without MIM¶
To install MMEngine with pip instead of MIM, please follow MMEngine installation guides.
For example, you can install MMEngine by the following command:
pip install mmengine
Install MMCV without MIM¶
MMCV contains C++ and CUDA extensions, thus depending on PyTorch in a complex way. MIM solves such dependencies automatically and makes the installation easier. However, it is not a must.
To install MMCV with pip instead of MIM, please follow MMCV installation guides. This requires manually specifying a find-url based on PyTorch version and its CUDA version.
For example, the following command install MMCV built for PyTorch 1.12.x and CUDA 11.6:
pip install "mmcv>=2.0.0rc4" -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html
Install on Google Colab¶
Google Colab usually has PyTorch installed, thus we only need to install MMEngine, MMCV, MMDetection, and MMDetection3D with the following commands.
Step 1. Install MMEngine, MMCV and MMDetection using MIM.
!pip3 install openmim
!mim install mmengine
!mim install "mmcv>=2.0.0rc4,<2.1.0"
!mim install "mmdet>=3.0.0,<3.1.0"
Step 2. Install MMDetection3D from source.
!git clone https://github.com/open-mmlab/mmdetection3d.git -b dev-1.x
%cd mmdetection3d
!pip install -e .
Step 3. Verification.
import mmdet3d
print(mmdet3d.__version__)
# Example output: 1.1.0rc0, or an another version.
Note
Within Jupyter, the exclamation mark !
is used to call external executables and %cd
is a magic command to change the current working directory of Python.
Using MMDetection3D with Docker¶
We provide a Dockerfile to build an image. Ensure that your docker version >= 19.03.
# build an image with PyTorch 1.9, CUDA 11.1
# If you prefer other versions, just modified the Dockerfile
docker build -t mmdetection3d docker/
Run it with:
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection3d/data mmdetection3d
Troubleshooting¶
If you have some issues during the installation, please first view the FAQ page. You may open an issue on GitHub if no solution is found.
Use Multiple Versions of MMDetection3D in Development¶
Training and testing scripts have already been modified in PYTHONPATH
in order to make sure the scripts are using their own versions of MMDetection3D.
To install the default version of MMDetection3D in your environment, you can exclude the following code in the related scripts:
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
2: Train with customized datasets¶
In this note, you will know how to train and test predefined models with customized datasets. We use the Waymo dataset as an example to describe the whole process.
The basic steps are as below:
Prepare the customized dataset
Prepare a config
Train, test, inference models on the customized dataset.
Prepare the customized dataset¶
There are three ways to support a new dataset in MMDetection3D:
reorganize the dataset into existing format.
reorganize the dataset into a standard format.
implement a new dataset.
Usually we recommend to use the first two methods which are usually easier than the third.
In this note, we give an example for converting the data into KITTI format, you can refer to this to reorganize your dataset into kitti format. About the standard format dataset, and you can refer to customize_dataset.md.
Note: We take Waymo as the example here considering its format is totally different from other existing formats. For other datasets using similar methods to organize data, like Lyft compared to nuScenes, it would be easier to directly implement the new data converter (for the second approach above) instead of converting it to another format (for the first approach above).
KITTI dataset format¶
Firstly, the raw data for 3D object detection from KITTI are typically organized as follows, where ImageSets
contains split files indicating which files belong to training/validation/testing set, calib
contains calibration information files, image_2
and velodyne
include image data and point cloud data, and label_2
includes label files for 3D detection.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
Specific annotation format is described in the official object development kit. For example, it consists of the following labels:
#Values Name Description
----------------------------------------------------------------------------
1 type Describes the type of object: 'Car', 'Van', 'Truck',
'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
'Misc' or 'DontCare'
1 truncated Float from 0 (non-truncated) to 1 (truncated), where
truncated refers to the object leaving image boundaries
1 occluded Integer (0,1,2,3) indicating occlusion state:
0 = fully visible, 1 = partly occluded
2 = largely occluded, 3 = unknown
1 alpha Observation angle of object, ranging [-pi..pi]
4 bbox 2D bounding box of object in the image (0-based index):
contains left, top, right, bottom pixel coordinates
3 dimensions 3D object dimensions: height, width, length (in meters)
3 location 3D object location x,y,z in camera coordinates (in meters)
1 rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi]
1 score Only for results: Float, indicating confidence in
detection, needed for p/r curves, higher is better.
Assume we use the Waymo dataset.
After downloading the data, we need to implement a function to convert both the input data and annotation format into the KITTI style. Then we can implement WaymoDataset
inherited from KittiDataset
to load the data and perform training, and implement WaymoMetric
inherited from KittiMetric
for evaluation.
Specifically, we implement a waymo converter to convert Waymo data into KITTI format and a waymo dataset class to process it, in addition need to add a waymo metric to evaluate results. Because we preprocess the raw data and reorganize it like KITTI, the dataset class could be implemented more easily by inheriting from KittiDataset. Regarding the dataset evaluation metric, because Waymo has its own evaluation approach, we need further implement a new Waymo metric; more about the metric could refer to metric_and_evaluator.md. Afterward, users can successfully convert the data format and use WaymoDataset
to train and evaluate the model by WaymoMetric
.
For more details about the intermediate results of preprocessing of Waymo dataset, please refer to its waymo_det.md.
Prepare a config¶
The second step is to prepare configs such that the dataset could be successfully loaded. In addition, adjusting hyperparameters is usually necessary to obtain decent performance in 3D detection.
Suppose we would like to train PointPillars on Waymo to achieve 3D detection for 3 classes, vehicle, cyclist and pedestrian, we need to prepare dataset config like this, model config like this and combine them like this, compared to KITTI dataset config, model config and overall.
Train a new model¶
To train a model with the new config, you can simply run
python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py
For more detailed usages, please refer to the Case 1.
Test and inference¶
To test the trained model, you can simply run
python tools/test.py configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py work_dirs/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class/latest.pth
Note: To use Waymo evaluation protocol, you need to follow the tutorial and prepare files related to metrics computation as official instructions.
For more detailed usages for test and inference, please refer to the Case 1.
Backends Support¶
We support different file client backends: Disk, Ceph and LMDB, etc. Here is an example of how to modify configs for Ceph-based data loading and saving.
Load data and annotations from Ceph¶
We support loading data and generated annotation info files (pkl and json) from Ceph:
# set file client backends as Ceph
backend_args = dict(
backend='petrel',
path_mapping=dict({
'./data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/', # replace the path with your data path on Ceph
'data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/' # replace the path with your data path on Ceph
}))
db_sampler = dict(
data_root=data_root,
info_path=data_root + 'kitti_dbinfos_train.pkl',
rate=1.0,
prepare=dict(filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5)),
sample_groups=dict(Car=15),
classes=class_names,
# set file client for points loader to load training data
points_loader=dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4,
use_dim=4,
backend_args=backend_args),
# set file client for data base sampler to load db info file
backend_args=backend_args)
train_pipeline = [
# set file client for loading training data
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args),
# set file client for loading training data annotations
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, backend_args=backend_args),
dict(type='ObjectSample', db_sampler=db_sampler),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[0.25, 0.25, 0.25],
global_rot_range=[0.0, 0.0],
rot_range=[-0.15707963267, 0.15707963267]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
# set file client for loading validation/testing data
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4, backend_args=backend_args),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
])
]
data = dict(
# set file client for loading training info files (.pkl)
train=dict(
type='RepeatDataset',
times=2,
dataset=dict(pipeline=train_pipeline, classes=class_names, backend_args=backend_args)),
# set file client for loading validation info files (.pkl)
val=dict(pipeline=test_pipeline, classes=class_names,backend_args=backend_args),
# set file client for loading testing info files (.pkl)
test=dict(pipeline=test_pipeline, classes=class_names, backend_args=backend_args))
Load pretrained model from Ceph¶
model = dict(
pts_backbone=dict(
_delete_=True,
type='NoStemRegNet',
arch='regnetx_1.6gf',
init_cfg=dict(
type='Pretrained', checkpoint='s3://openmmlab/checkpoints/mmdetection3d/regnetx_1.6gf'), # replace the path with your pretrained model path on Ceph
...
Load checkpoint from Ceph¶
# replace the path with your checkpoint path on Ceph
load_from = 's3://openmmlab/checkpoints/mmdetection3d/v0.1.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20200620_230614-77663cd6.pth.pth'
resume_from = None
workflow = [('train', 1)]
Save checkpoint into Ceph¶
# checkpoint saving
# replace the path with your checkpoint saving path on Ceph
checkpoint_config = dict(interval=1, max_keep_ckpts=2, out_dir='s3://openmmlab/mmdetection3d')
EvalHook saves the best checkpoint into Ceph¶
# replace the path with your checkpoint saving path on Ceph
evaluation = dict(interval=1, save_best='bbox', out_dir='s3://openmmlab/mmdetection3d')
Save the training log into Ceph¶
The training log will be backed up to the specified Ceph path after training.
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d'),
])
You can also delete the local training log after backing up to the specified Ceph path by setting keep_local = False
.
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', out_dir='s3://openmmlab/mmdetection3d', keep_local=False),
])
Learn about Configs¶
MMDetection3D and other OpenMMLab repositories use MMEngine’s config system. It has a modular and inheritance design, which is convenient to conduct various experiments.
Config file content¶
MMDetection3D uses a modular design, all modules with different functions can be configured through the config. Taking PointPillars as an example, we will introduce each field in the config according to different function modules.
Model config¶
In MMDetection3D’s config, we use model
to setup detection algorithm components. In addition to neural network components such as voxel_encoder
, backbone
etc, it also requires data_preprocessor
, train_cfg
, and test_cfg
. data_preprocessor
is responsible for processing a batch of data output by dataloader. train_cfg
and test_cfg
in the model config are training and testing hyperparameters of the components.
model = dict(
type='VoxelNet',
data_preprocessor=dict(
type='Det3DDataPreprocessor',
voxel=True,
voxel_layer=dict(
max_num_points=32,
point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1],
voxel_size=[0.16, 0.16, 4],
max_voxels=(16000, 40000))),
voxel_encoder=dict(
type='PillarFeatureNet',
in_channels=4,
feat_channels=[64],
with_distance=False,
voxel_size=[0.16, 0.16, 4],
point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1]),
middle_encoder=dict(
type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
backbone=dict(
type='SECOND',
in_channels=64,
layer_nums=[3, 5, 5],
layer_strides=[2, 2, 2],
out_channels=[64, 128, 256]),
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
bbox_head=dict(
type='Anchor3DHead',
num_classes=3,
in_channels=384,
feat_channels=384,
use_direction_classifier=True,
assign_per_class=True,
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -1.78, 69.12, 39.68, -1.78]],
sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
rotations=[0, 1.57],
reshape_out=False),
diff_rad_by_sin=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='mmdet.SmoothL1Loss',
beta=0.1111111111111111,
loss_weight=2.0),
loss_dir=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=False,
loss_weight=0.2)),
train_cfg=dict(
assigner=[
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict(
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1)
],
allowed_border=0,
pos_weight=-1,
debug=False),
test_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_thr=0.01,
score_thr=0.1,
min_bbox_size=0,
nms_pre=100,
max_num=50))
Dataset and evaluator config¶
Dataloaders are required for the training, validation, and testing of the runner. Dataset and data pipeline need to be set to build the dataloader. Due to the complexity of this part, we use intermediate variables to simplify the writing of dataloader configs.
dataset_type = 'KittiDataset'
data_root = 'data/kitti/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]
input_modality = dict(use_lidar=True, use_camera=False)
metainfo = dict(classes=class_names)
db_sampler = dict(
data_root=data_root,
info_path=data_root + 'kitti_dbinfos_train.pkl',
rate=1.0,
prepare=dict(
filter_by_difficulty=[-1],
filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
classes=class_names,
sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15),
points_loader=dict(
type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4))
train_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(type='ObjectSample', db_sampler=db_sampler, use_ground_plane=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_labels_3d', 'gt_bboxes_3d'])
]
test_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
]),
dict(type='Pack3DDetInputs', keys=['points'])
]
eval_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='Pack3DDetInputs', keys=['points'])
]
train_dataloader = dict(
batch_size=6,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='RepeatDataset',
times=2,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='kitti_infos_train.pkl',
data_prefix=dict(pts='training/velodyne_reduced'),
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
metainfo=metainfo,
box_type_3d='LiDAR')))
val_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='training/velodyne_reduced'),
ann_file='kitti_infos_val.pkl',
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='training/velodyne_reduced'),
ann_file='kitti_infos_val.pkl',
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
Evaluators are used to compute the metrics of the trained model on the validation and testing datasets. The config of evaluators consists of one or a list of metric configs:
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'kitti_infos_val.pkl',
metric='bbox')
test_evaluator = val_evaluator
Since the test dataset has no annotation files, the test_dataloader and test_evaluator config in MMDetection3D are generally equal to the val’s. If you want to save the detection results on the test dataset, you can write the config like this:
# inference on test dataset and
# format the output results for submission.
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='testing/velodyne_reduced'),
ann_file='kitti_infos_test.pkl',
load_eval_anns=False,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
test_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'kitti_infos_test.pkl',
metric='bbox',
format_only=True,
submission_prefix='results/kitti-3class/kitti_results')
Training and testing config¶
MMEngine’s runner uses Loop to control the training, validation, and testing processes. Users can set the maximum training epochs and validation intervals with these fields:
train_cfg = dict(
type='EpochBasedTrainLoop',
max_epochs=80,
val_interval=2)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
Optimization config¶
optim_wrapper
is the field to configure optimization-related settings. The optimizer wrapper not only provides the functions of the optimizer, but also supports functions such as gradient clipping, mixed precision training, etc. Find more in optimizer wrapper tutorial.
optim_wrapper = dict( # Optimizer wrapper config
type='OptimWrapper', # Optimizer wrapper type, switch to AmpOptimWrapper to enable mixed precision training.
optimizer=dict( # Optimizer config. Support all kinds of optimizers in PyTorch. Refer to https://pytorch.org/docs/stable/optim.html#algorithms
type='AdamW', lr=0.001, betas=(0.95, 0.99), weight_decay=0.01),
clip_grad=dict(max_norm=35, norm_type=2)) # Gradient clip option. Set None to disable gradient clip. Find usage in https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html
param_scheduler
is a field that configures methods of adjusting optimization hyperparameters such as learning rate and momentum. Users can combine multiple schedulers to create a desired parameter adjustment strategy. Find more in parameter scheduler tutorial and parameter scheduler API documents.
param_scheduler = [
dict(
type='CosineAnnealingLR',
T_max=32,
eta_min=0.01,
begin=0,
end=32,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=48,
eta_min=1.0000000000000001e-07,
begin=32,
end=80,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingMomentum',
T_max=32,
eta_min=0.8947368421052632,
begin=0,
end=32,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingMomentum',
T_max=48,
eta_min=1,
begin=32,
end=80,
by_epoch=True,
convert_to_iter_based=True),
]
Hook config¶
Users can attach Hooks to training, validation, and testing loops to insert some operations during running. There are two different hook fields, one is default_hooks
and the other is custom_hooks
.
default_hooks
is a dict of hook configs, and they are the hooks must be required at the runtime. They have default priority which should not be modified. If not set, runner will use the default values. To disable a default hook, users can set its config to None
.
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=-1),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='Det3DVisualizationHook'))
custom_hooks
is a list of all other hook configs. Users can develop their own hooks and insert them in this field.
custom_hooks = []
Runtime config¶
default_scope = 'mmdet3d' # The default registry scope to find modules. Refer to https://mmengine.readthedocs.io/en/latest/advanced_tutorials/registry.html
env_cfg = dict(
cudnn_benchmark=False, # Whether to enable cudnn benchmark
mp_cfg=dict( # Multi-processing config
mp_start_method='fork', # Use fork to start multi-processing threads. 'fork' usually faster than 'spawn' but maybe unsafe. See discussion in https://github.com/pytorch/pytorch/issues/1355
opencv_num_threads=0), # Disable opencv multi-threads to avoid system being overloaded
dist_cfg=dict(backend='nccl')) # Distribution configs
vis_backends = [dict(type='LocalVisBackend')] # Visualization backends. Refer to https://mmengine.readthedocs.io/en/latest/advanced_tutorials/visualization.html
visualizer = dict(
type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
log_processor = dict(
type='LogProcessor', # Log processor to process runtime logs
window_size=50, # Smooth interval of log values
by_epoch=True) # Whether to format logs with epoch type. Should be consistent with the train loop's type.
log_level = 'INFO' # The level of logging.
load_from = None # Load model checkpoint as a pre-trained model from a given path. This will not resume training.
resume = False # Whether to resume from the checkpoint defined in `load_from`. If `load_from` is None, it will resume the latest checkpoint in the `work_dir`.
Config file inheritance¶
There are 4 basic component types under configs/_base_
, dataset, model, schedule, default_runtime.
Many methods could be easily constructed with one of these models like SECOND, PointPillars, PartA2, VoteNet.
The configs that are composed of components from _base_
are called primitive.
For all configs under the same folder, it is recommended to have only one primitive config. All other configs should inherit from the primitive config. In this way, the maximum of inheritance level is 3.
For easy understanding, we recommend contributors to inherit from existing methods.
For example, if some modification is made based on PointPillars, users may first inherit the basic PointPillars structure by specifying _base_ = '../pointpillars/pointpillars_hv_fpn_sbn-all_8xb4-2x_nus-3d.py'
, then modify the necessary fields in the config files.
If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder xxx_rcnn
under configs
.
Please refer to MMEngine config tutorial for detailed documentation.
By setting the _base_
field, we can set which files the current configuration file inherits from.
When _base_
is a string of a file path, it means inheriting the contents from one config file.
_base_ = './pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py'
When _base_
is a list of multiple file paths, it means inheriting from multiple files.
_base_ = [
'../_base_/models/pointpillars_hv_secfpn_kitti.py',
'../_base_/datasets/kitti-3d-3class.py',
'../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
]
If you wish to inspect the config file, you may run python tools/misc/print_config.py /PATH/TO/CONFIG
to see the complete config.
Ignore some fields in the base configs¶
Sometimes, you may set _delete_=True
to ignore some of the fields in base configs.
You may refer to MMEngine config tutorial for a simple illustration.
In MMDetection3D, for example, to change the neck of PointPillars with the following config:
model = dict(
type='MVXFasterRCNN',
data_preprocessor=dict(voxel_layer=dict(...)),
pts_voxel_encoder=dict(...),
pts_middle_encoder=dict(...),
pts_backbone=dict(...),
pts_neck=dict(
type='FPN',
norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
act_cfg=dict(type='ReLU'),
in_channels=[64, 128, 256],
out_channels=256,
start_level=0,
num_outs=3),
pts_bbox_head=dict(...))
FPN
and SECONDFPN
use different keywords to construct:
_base_ = '../_base_/models/pointpillars_hv_fpn_nus.py'
model = dict(
pts_neck=dict(
_delete_=True,
type='SECONDFPN',
norm_cfg=dict(type='naiveSyncBN2d', eps=1e-3, momentum=0.01),
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
pts_bbox_head=dict(...))
The _delete_=True
would replace all old keys in pts_neck
field with new keys.
Use intermediate variables in configs¶
Some intermediate variables are used in the configs files, like train_pipeline
/test_pipeline
in datasets.
It’s worth noting that when modifying intermediate variables in the children configs, user needs to pass the intermediate variables into corresponding fields again.
For example, we would like to use a multi-scale strategy to train and test a PointPillars, train_pipeline
/test_pipeline
are intermediate variables we would like to modify.
_base_ = './nus-3d.py'
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_labels_3d', 'gt_bboxes_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=[0.95, 1.0, 1.05],
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range)
]),
dict(type='Pack3DDetInputs', keys=['points'])
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
test_dataloader = dict(dataset=dict(pipeline=test_pipeline))
We first define the new train_pipeline
/test_pipeline
and pass them into dataloader fields.
Reuse variables in _base_ file¶
If the users want to reuse the variables in the base file, they can get a copy of the corresponding variable by using {{_base_.xxx}}
. E.g:
_base_ = './pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py'
a = {{_base_.model}} # variable `a` is equal to the `model` defined in `_base_`
Modify config through script arguments¶
When submitting jobs using tools/train.py
or tools/test.py
, you may specify --cfg-options
to in-place modify the config.
Update config keys of dict chains
The config options can be specified following the order of the dict keys in the original config. For example,
--cfg-options model.backbone.norm_eval=False
changes the all BN modules in model backbones totrain
mode.Update keys inside a list of configs
Some config dicts are composed as a list in your config. For example, the training pipeline
train_dataloader.dataset.pipeline
is normally a list e.g.[dict(type='LoadPointsFromFile'), ...]
. If you want to change'LoadPointsFromFile'
to'LoadPointsFromDict'
in the pipeline, you may specify--cfg-options data.train.pipeline.0.type=LoadPointsFromDict
.Update values of list/tuple
If the value to be updated is a list or a tuple. For example, the config file normally sets
model.data_preprocessor.mean=[123.675, 116.28, 103.53]
. If you want to change the mean values, you may specify--cfg-options model.data_preprocessor.mean="[127,127,127]"
. Note that the quotation mark"
is necessary to support list/tuple data types, and that NO white space is allowed inside the quotation marks in the specified value.
Config Name Style¶
We follow the below style to name config files. Contributors are advised to follow the same style.
{algorithm name}_{model component names [component1]_[component2]_[...]}_{training settings}_{training dataset information}_{testing dataset information}.py
The file name is divided to five parts. All parts and components are connected with _
and words of each part or component should be connected with -
.
{algorithm name}
: The name of the algorithm. It can be a detector name such aspointpillars
,fcos3d
, etc.{model component names}
: Names of the components used in the algorithm such as voxel_encoder, backbone, neck, etc. For example,second_secfpn_head-dcn-circlenms
means using SECOND’s SparseEncoder, SECONDFPN and a detection head with DCN and circle NMS.{training settings}
: Information of training settings such as batch size, augmentations, loss trick, scheduler, and epochs/iterations. For example:8xb4-tta-cyclic-20e
means using 8-gpus x 4-samples-per-gpu, test time augmentation, cyclic annealing learning rate, and train 20 epochs. Some abbreviations:{gpu x batch_per_gpu}
: GPUs and samples per GPU.bN
indicates N batch size per GPU. E.g.4xb4
is the short term of 4-GPUs x 4-samples-per-GPU.{schedule}
: training schedule, options areschedule-2x
,schedule-3x
,cyclic-20e
, etc.schedule-2x
andschedule-3x
mean 24 epochs and 36 epochs respectively.cyclic-20e
means 20 epochs respectively.
{training dataset information}
: Training dataset names likekitti-3d-3class
,nus-3d
,s3dis-seg
,scannet-seg
,waymoD5-3d-car
. Here3d
means dataset used for 3D object detection, andseg
means dataset used for point cloud segmentation.{testing dataset information}
(optional): Testing dataset name for models trained on one dataset but tested on another. If not mentioned, it means the model was trained and tested on the same dataset type.
Coordinate System¶
Overview¶
MMDetection3D uses three different coordinate systems. The existence of different coordinate systems in the society of 3D object detection is necessary, because for various 3D data collection devices, such as LiDAR, depth camera, etc., the coordinate systems are not consistent, and different 3D datasets also follow different data formats. Early works, such as SECOND, VoteNet, convert the raw data to another format, forming conventions that some later works also follow, making the conversion between coordinate systems even more complicated.
Despite the variety of datasets and equipment, by summarizing the line of works on 3D object detection we can roughly categorize coordinate systems into three:
Camera coordinate system – the coordinate system of most cameras, in which the positive direction of the y-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the z-axis points to the front.
up z front | ^ | / | / | / |/ left ------ 0 ------> x right | | | | v y down
LiDAR coordinate system – the coordinate system of many LiDARs, in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the front, and the positive direction of the y-axis points to the left.
z up x front ^ ^ | / | / | / |/ y left <------ 0 ------ right
Depth coordinate system – the coordinate system used by VoteNet, H3DNet, etc., in which the negative direction of the z-axis points to the ground, the positive direction of the x-axis points to the right, and the positive direction of the y-axis points to the front.
z up y front ^ ^ | / | / | / |/ left ------ 0 ------> x right
The definition of coordinate systems in this tutorial is actually more than just defining the three axes. For a box in the form of \((x, y, z, dx, dy, dz, r)\), our coordinate systems also define how to interpret the box dimensions \((dx, dy, dz)\) and the yaw angle \(r\).
The illustration of the three coordinate systems is shown below:
The three figures above are the 3D coordinate systems while the three figures below are the bird’s eye view.
We will stick to the three coordinate systems defined in this tutorial in the future.
Definition of the yaw angle¶
Please refer to wikipedia for the standard definition of the yaw angle. In object detection, we choose an axis as the gravity axis, and a reference direction on the plane \(\Pi\) perpendicular to the gravity axis, then the reference direction has a yaw angle of 0, and other directions on \(\Pi\) have non-zero yaw angles depending on its angle with the reference direction.
Currently, for all supported datasets, annotations do not include pitch angle and roll angle, which means we need only consider the yaw angle when predicting boxes and calculating overlap between boxes.
In MMDetection3D, all three coordinate systems are right-handed coordinate systems, which means the ascending direction of the yaw angle is counter-clockwise if viewed from the negative direction of the gravity axis (the axis is pointing at one’s eyes).
The figure below shows that, in this right-handed coordinate system, if we set the positive direction of the x-axis as a reference direction, then the positive direction of the y-axis has a yaw angle of \(\frac{\pi}{2}\).
z up y front (yaw=0.5*pi)
^ ^
| /
| /
| /
|/
left (yaw=pi) ------ 0 ------> x right (yaw=0)
For a box, the value of its yaw angle equals its direction minus a reference direction. In all three coordinate systems in MMDetection3D, the reference direction is always the positive direction of the x-axis, while the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. The definition of the yaw angle of a box is illustrated in the figure below.
y front
^ box direction (yaw=0.5*pi)
/|\ ^
| /|\
| ____|____
| | | |
| | | |
__|____|____|____|______\ x right
| | | | /
| | | |
| |____|____|
|
Definition of the box dimensions¶
The definition of the box dimensions cannot be disentangled with the definition of the yaw angle. In the previous section, we said that the direction of a box is defined to be parallel with the x-axis if its yaw angle is 0. Then naturally, the dimension of a box which corresponds to the x-axis should be \(dx\). However, this is not always the case in some datasets (we will address that later).
The following figures show the meaning of the correspondence between the x-axis and \(dx\), and between the y-axis and \(dy\).
y front
^ box direction (yaw=0.5*pi)
/|\ ^
| /|\
| ____|____
| | | |
| | | | dx
__|____|____|____|______\ x right
| | | | /
| | | |
| |____|____|
| dy
Note that the box direction is always parallel with the edge \(dx\).
y front
^ _________
/|\ | | |
| | | |
| | | | dy
| |____|____|____\ box direction (yaw=0)
| | | | /
__|____|____|____|_________\ x right
| | | | /
| |____|____|
| dx
|
Relation with raw coordinate systems of supported datasets¶
KITTI¶
The raw annotation of KITTI is under camera coordinate system, see get_label_anno. In MMDetection3D, to train LiDAR-based models on KITTI, the data is first converted from camera coordinate system to LiDAR coordinate system, see get_ann_info. For training vision-based models, the data is kept in the camera coordinate system.
In SECOND, the LiDAR coordinate system for a box is defined as follows (a bird’s eye view):
For each box, the dimensions are \((w, l, h)\), and the reference direction for the yaw angle is the positive direction of the y axis. For more details, refer to the repo.
Our LiDAR coordinate system has two changes:
The yaw angle is defined to be right-handed instead of left-handed for consistency;
The box dimensions are \((l, w, h)\) instead of \((w, l, h)\), since \(w\) corresponds to \(dy\) and \(l\) corresponds to \(dx\) in KITTI.
Waymo¶
We use the KITTI-format data of Waymo dataset. Therefore, KITTI and Waymo also share the same coordinate system in our implementation.
NuScenes¶
NuScenes provides a toolkit for evaluation, in which each box is wrapped into a Box
instance. The coordinate system of Box
is different from our LiDAR coordinate system in that the first two elements of the box dimension correspond to \((dy, dx)\), or \((w, l)\), respectively, instead of the reverse. For more details, please refer to the NuScenes tutorial.
Readers may refer to the NuScenes development kit for the definition of a NuScenes box and implementation of NuScenes evaluation.
Lyft¶
Lyft shares the same data format with NuScenes as far as coordinate system is involved.
Please refer to the official website for more information.
ScanNet¶
The raw data of ScanNet is not point cloud but mesh. The sampled point cloud data is under our depth coordinate system. For ScanNet detection task, the box annotations are axis-aligned, and the yaw angle is always zero. Therefore the direction of the yaw angle in our depth coordinate system makes no difference regarding ScanNet.
SUN RGB-D¶
The raw data of SUN RGB-D is not point cloud but RGB-D image. By back projection, we obtain the corresponding point cloud for each image, which is under our Depth coordinate system. However, the annotation is not under our system and thus needs conversion.
For the conversion from raw annotation to annotation under our Depth coordinate system, please refer to sunrgbd_data_utils.py.
S3DIS¶
S3DIS shares the same coordinate system as ScanNet in our implementation. However, S3DIS is a segmentation-task-only dataset, and thus no annotation is coordinate system sensitive.
Examples¶
Box conversion (between different coordinate systems)¶
Take the conversion between our Camera coordinate system and LiDAR coordinate system as an example:
First, for points and box centers, the coordinates before and after the conversion satisfy the following relationship:
\(x_{LiDAR}=z_{camera}\)
\(y_{LiDAR}=-x_{camera}\)
\(z_{LiDAR}=-y_{camera}\)
Then, the box dimensions before and after the conversion satisfy the following relationship:
\(dx_{LiDAR}=dx_{camera}\)
\(dy_{LiDAR}=dz_{camera}\)
\(dz_{LiDAR}=dy_{camera}\)
Finally, the yaw angle should also be converted:
\(r_{LiDAR}=-\frac{\pi}{2}-r_{camera}\)
See the code here for more details.
Bird’s Eye View¶
The BEV of a camera coordinate system box is \((x, z, dx, dz, -r)\) if the 3D box is \((x, y, z, dx, dy, dz, r)\). The inversion of the sign of the yaw angle is because the positive direction of the gravity axis of the Camera coordinate system points to the ground.
See the code here for more details.
Common FAQ¶
Q2: In every coordinate system, do the three axes point exactly to the right, the front, and the ground, respectively?¶
No. For example, in KITTI, we need a calibration matrix when converting from Camera coordinate system to LiDAR coordinate system.
Q3: How does a phase difference of \(2\pi\) in the yaw angle of a box affect evaluation?¶
For IoU calculation, a phase difference of \(2\pi\) in the yaw angle will result in the same box, thus not affecting evaluation.
For angle prediction evaluation such as the NDS metric in NuScenes and the AOS metric in KITTI, the angle of predicted boxes will be first standardized, so the phase difference of \(2\pi\) will not change the result.
Q4: How does a phase difference of \(\pi\) in the yaw angle of a box affect evaluation?¶
For IoU calculation, a phase difference of \(\pi\) in the yaw angle will result in the same box, thus not affecting evaluation.
However, for angle prediction evaluation, this will result in the exact opposite direction.
Just think about a car. The yaw angle is the angle between the direction of the car front and the positive direction of the x-axis. If we add \(\pi\) to this angle, the car front will become the car rear.
For categories such as barrier, the front and the rear have no difference, therefore a phase difference of \(\pi\) will not affect the angle prediction score.
Customize Data Pipelines¶
Design of Data pipelines¶
Following typical conventions, we use Dataset
and DataLoader
for data loading
with multiple workers. Dataset
returns a dict of data items corresponding
the arguments of models’ forward method.
Since the data in object detection may not be the same size (point number, gt bbox size, etc.),
we introduce a new DataContainer
type in MMCV to help collect and distribute
data of different size.
See here for more details.
The data preparation pipeline and the dataset is decomposed. Usually a dataset defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict. A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
We present a classical pipeline in the following figure. The blue blocks are pipeline operations. With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.
Here is an pipeline example for PointPillars.
train_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
load_dim=5,
use_dim=5,
backend_args=backend_args),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10,
backend_args=backend_args),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
pts_scale_ratio=1.0,
flip=False,
pcd_horizontal_flip=False,
pcd_vertical_flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
])
]
For each operation, we list the related dict fields that are added/updated/removed.
Data loading¶
LoadPointsFromFile
add: points
LoadPointsFromMultiSweeps
update: points
LoadAnnotations3D
add: gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels, pts_instance_mask, pts_semantic_mask, bbox3d_fields, pts_mask_fields, pts_seg_fields
Pre-processing¶
GlobalRotScaleTrans
add: pcd_trans, pcd_rotation, pcd_scale_factor
update: points, *bbox3d_fields
RandomFlip3D
add: flip, pcd_horizontal_flip, pcd_vertical_flip
update: points, *bbox3d_fields
PointsRangeFilter
update: points
ObjectRangeFilter
update: gt_bboxes_3d, gt_labels_3d
ObjectNameFilter
update: gt_bboxes_3d, gt_labels_3d
PointShuffle
update: points
PointsRangeFilter
update: points
Formatting¶
DefaultFormatBundle3D
update: points, gt_bboxes_3d, gt_labels_3d, gt_bboxes, gt_labels
Collect3D
add: img_meta (the keys of img_meta is specified by
meta_keys
)remove: all other keys except for those specified by
keys
Test time augmentation¶
MultiScaleFlipAug
update: scale, pcd_scale_factor, flip, flip_direction, pcd_horizontal_flip, pcd_vertical_flip with list of augmented data with these specific parameters
Extend and use custom pipelines¶
Write a new pipeline in any file, e.g.,
my_pipeline.py
. It takes a dict as input and return a dict.from mmdet.datasets import PIPELINES @PIPELINES.register_module() class MyTransform: def __call__(self, results): results['dummy'] = True return results
Import the new class.
from .my_pipeline import MyTransform
Use it in config files.
train_pipeline = [ dict( type='LoadPointsFromFile', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=10, backend_args=backend_args), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True), dict( type='GlobalRotScaleTrans', rot_range=[-0.3925, 0.3925], scale_ratio_range=[0.95, 1.05], translation_std=[0, 0, 0]), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectNameFilter', classes=class_names), dict(type='MyTransform'), dict(type='PointShuffle'), dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d']) ]
Dataset Preparation¶
Before Preparation¶
It is recommended to symlink the dataset root to $MMDETECTION3D/data
.
If your folder structure is different from the following, you may need to change the corresponding paths in config files.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
│ ├── lyft
│ │ ├── v1.01-train
│ │ │ ├── v1.01-train (train_data)
│ │ │ ├── lidar (train_lidar)
│ │ │ ├── images (train_images)
│ │ │ ├── maps (train_maps)
│ │ ├── v1.01-test
│ │ │ ├── v1.01-test (test_data)
│ │ │ ├── lidar (test_lidar)
│ │ │ ├── images (test_images)
│ │ │ ├── maps (test_maps)
│ │ ├── train.txt
│ │ ├── val.txt
│ │ ├── test.txt
│ │ ├── sample_submission.csv
│ ├── s3dis
│ │ ├── meta_data
│ │ ├── Stanford3dDataset_v1.2_Aligned_Version
│ │ ├── collect_indoor3d_data.py
│ │ ├── indoor3d_util.py
│ │ ├── README.md
│ ├── scannet
│ │ ├── meta_data
│ │ ├── scans
│ │ ├── scans_test
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
│ ├── sunrgbd
│ │ ├── OFFICIAL_SUNRGBD
│ │ ├── matlab
│ │ ├── sunrgbd_data.py
│ │ ├── sunrgbd_utils.py
│ │ ├── README.md
Download and Data Preparation¶
KITTI¶
Download KITTI 3D detection data HERE. Prepare KITTI data splits by running:
mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets
# Download data split
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt
Then generate info files by running:
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti
In an environment using slurm, users may run the following command instead:
sh tools/create_data.sh <partition> kitti
Waymo¶
Download Waymo open dataset V1.2 HERE and its data split HERE. Then put .tfrecord
files into corresponding folders in data/waymo/waymo_format/
and put the data split .txt
files into data/waymo/kitti_format/ImageSets
. Download ground truth .bin
file for validation set HERE and put it into data/waymo/waymo_format/
. A tip is that you can use gsutil
to download the large-scale dataset with commands. You can take this tool as an example for more details. Subsequently, prepare waymo data by running:
python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo
Note that:
If your local disk does not have enough space for saving converted data, you can change the
--out-dir
to anywhere else. Just remember to create folders and prepare data there in advance and link them back todata/waymo/kitti_format
after the data conversion.If you want faster evaluation on Waymo, you can download the preprocessed metainfo containing
contextname
andtimestamp
to the directorydata/waymo/waymo_format/
. Then, the dataset config is modified like the following:val_evaluator = dict( type='WaymoMetric', ann_file='./data/waymo/kitti_format/waymo_infos_val.pkl', waymo_bin_file='./data/waymo/waymo_format/gt.bin', data_root='./data/waymo/waymo_format', backend_args=backend_args, convert_kitti_format=True, idx2metainfo='data/waymo/waymo_format/idx2metainfo.pkl' )
Now, this trick is only used for LiDAR-based detection methods.
NuScenes¶
Download nuScenes V1.0 full dataset data HERE. Prepare nuscenes data by running:
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
Lyft¶
Download Lyft 3D detection data HERE. Prepare Lyft data by running:
python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft --extra-tag lyft --version v1.01
python tools/dataset_converters/lyft_data_fixer.py --version v1.01 --root-folder ./data/lyft
Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above. Also note that the second command serves the purpose of fixing a corrupted lidar data file. Please refer to the discussion for more details.
S3DIS, ScanNet and SUN RGB-D¶
To prepare S3DIS data, please see its README.
To prepare ScanNet data, please see its README.
To prepare SUN RGB-D data, please see its README.
Customized Datasets¶
For using custom datasets, please refer to Customize Datasets.
Update data infos¶
If you have used v1.0.0rc1-v1.0.0rc4 mmdetection3d to create data infos before, and now you want to use the newest v1.1.0 mmdetection3d, you need to update the data infos file.
python tools/dataset_converters/update_infos_to_v2.py --dataset ${DATA_SET} --pkl-path ${PKL_PATH} --out-dir ${OUT_DIR}
--dataset
: Name of dataset.--pkl-path
: Specify the data infos pkl file path.--out-dir
: Output direction of the data infos pkl file.
Example:
python tools/dataset_converters/update_infos_to_v2.py --dataset kitti --pkl-path ./data/kitti/kitti_infos_trainval.pkl --out-dir ./data/kitti
Inference¶
Introduction¶
We provide scripts for multi-modality/single-modality (LiDAR-based/vision-based), indoor/outdoor 3D detection and 3D semantic segmentation demos. The pre-trained models can be downloaded from model zoo. We provide pre-processed sample data from KITTI, SUN RGB-D, nuScenes and ScanNet dataset. You can use any other data following our pre-processing steps.
Testing¶
3D Detection¶
Single-modality demo¶
To test a 3D detector on point cloud data, simply run:
python demo/pcd_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
The visualization results including a point cloud and predicted 3D bounding boxes will be saved in ${OUT_DIR}/PCD_NAME
, which you can open using MeshLab. Note that if you set the flag --show
, the prediction result will be displayed online using Open3D.
Example on KITTI data using SECOND model:
python demo/pcd_demo.py demo/data/kitti/000008.bin configs/second/second_hv-secfpn_8xb6-80e_kitti-3d-car.py checkpoints/second_hv-secfpn_8xb6-80e_kitti-3d-car_20200620_230238-393f000c.pth
Example on SUN RGB-D data using VoteNet model:
python demo/pcd_demo.py demo/data/sunrgbd/sunrgbd_000017.bin configs/votenet/votenet_8xb16_sunrgbd-3d.py checkpoints/votenet_8xb16_sunrgbd-3d_20200620_230238-4483c0c0.pth
Remember to convert the VoteNet checkpoint if you are using mmdetection3d version >= 0.6.0. See its README for detailed instructions on how to convert the checkpoint.
Multi-modality demo¶
To test a 3D detector on multi-modality data (typically point cloud and image), simply run:
python demo/multi_modality_demo.py ${PCD_FILE} ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--out-dir ${OUT_DIR}] [--show]
where the ANNOTATION_FILE
should provide the 3D to 2D projection matrix. The visualization results including a point cloud, an image, predicted 3D bounding boxes and their projection on the image will be saved in ${OUT_DIR}/PCD_NAME
.
Example on KITTI data using MVX-Net model:
python demo/multi_modality_demo.py demo/data/kitti/000008.bin demo/data/kitti/000008.png demo/data/kitti/000008.pkl configs/mvxnet/mvx_fpn-dv-second-secfpn_8xb2-80e_kitti-3d-3class.py checkpoints/mvx_fpn-dv-second-secfpn_8xb2-80e_kitti-3d-3class_20200621_003904-10140f2d.pth
Example on SUN RGB-D data using ImVoteNet model:
python demo/multi_modality_demo.py demo/data/sunrgbd/sunrgbd_000017.bin demo/data/sunrgbd/sunrgbd_000017.jpg demo/data/sunrgbd/sunrgbd_000017_infos.pkl configs/imvotenet/imvotenet_stage2_8xb16_sunrgbd.py checkpoints/imvotenet_stage2_8xb16_sunrgbd_20210323_184021-d44dcb66.pth
Monocular 3D Detection¶
To test a monocular 3D detector on image data, simply run:
python demo/mono_det_demo.py ${IMAGE_FILE} ${ANNOTATION_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--cam-type ${CAM_TYPE}] [--score-thr ${SCORE-THR}] [--out-dir ${OUT_DIR}] [--show]
where the ANNOTATION_FILE
should provide the 3D to 2D projection matrix (camera intrinsic matrix), and CAM_TYPE
should be specified according to dataset. For example, if you want to inference on the front camera image, the CAM_TYPE
should be set as CAM_2
for KITTI, and CAM_FRONT
for nuScenes. By specifying CAM_TYPE
, you can even infer on any camera images for datasets with multi-view cameras, such as nuScenes and Waymo. SCORE-THR
is the 3D bbox threshold while visualization. The visualization results including an image and its predicted 3D bounding boxes projected on the image will be saved in ${OUT_DIR}/IMG_NAME
.
Example on nuScenes data using FCOS3D model:
python demo/mono_det_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.jpg demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.pkl configs/fcos3d/fcos3d_r101-caffe-dcn-fpn-head-gn_8xb2-1x_nus-mono3d_finetune.py checkpoints/fcos3d_r101-caffe-dcn-fpn-head-gn_8xb2-1x_nus-mono3d_finetune_20210717_095645-8d806dc2.pth
Note that when visualizing results of monocular 3D detection for flipped images, the camera intrinsic matrix should also be modified accordingly. See more details and examples in PR #744.
3D Segmentation¶
To test a 3D segmentor on point cloud data, simply run:
python demo/pcd_seg_demo.py ${PCD_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--out-dir ${OUT_DIR}] [--show]
The visualization results including a point cloud and its predicted 3D segmentation mask will be saved in ${OUT_DIR}/PCD_NAME
.
Example on ScanNet data using PointNet++ (SSG) model:
python demo/pc_seg_demo.py demo/data/scannet/scene0000_00.bin configs/pointnet2/pointnet2_ssg_2xb16-cosine-200e_scannet-seg.py checkpoints/pointnet2_ssg_2xb16-cosine-200e_scannet-seg_20210514_143644-ee73704a.pth
Model Deployment¶
MMDet3D 1.1 fully relies on MMDeploy to deploy models. Please stay tuned and this document will be update soon.
Inference and train with existing models and standard datasets¶
Inference with existing models¶
Here we provide testing scripts to evaluate a whole dataset (SUNRGBD, ScanNet, KITTI, etc.).
For high-level apis easier to integrated into other projects and basic demos, please refer to Verification/Demo under Get Started.
Test existing models on standard datasets¶
single GPU
CPU
single node multiple GPU
multiple node
You can use the following commands to test a dataset.
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
# CPU: disable GPUs and run single-gpu testing script (experimental)
export CUDA_VISIBLE_DEVICES=-1
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--cfg-options test_evaluator.pklfile_prefix=${RESULT_FILE}] [--show] [--show-dir ${SHOW_DIR}]
Note:
For now, CPU testing is only supported for SMOKE.
Optional arguments:
--show
: If specified, detection results will be plotted in the silient mode. It is only applicable to single GPU testing and used for debugging and visualization. This should be used with--show-dir
.--show-dir
: If specified, detection results will be plotted on the***_points.obj
and***_pred.obj
files in the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option.
All evaluation related arguments are set in the test_evaluator
in corresponding dataset configuration. such as
test_evaluator = dict(type='KittiMetric', ann_file=data_root + 'kitti_infos_val.pkl', pklfile_prefix=None, submission_prefix=None)
The arguments:
type
: The name of the corresponding metric, usually associated with the dataset.ann_file
: The path of annotation file.pklfile_prefix
: An optional argument. The filename of the output results in pickle format. If not specified, the results will not be saved to a file.submission_prefix
: An optional argument. The results will be saved to a file then you can upload it to do the official evaluation.
Examples:
Assume that you have already downloaded the checkpoints to the directory checkpoints/
.
Test VoteNet on ScanNet and save the points and prediction visualization results.
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \ checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \ --show --show-dir ./data/scannet/show_results
Test VoteNet on ScanNet, save the points, prediction, groundtruth visualization results, and evaluate the mAP.
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \ checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth \ --show --show-dir ./data/scannet/show_results
Test VoteNet on ScanNet (without saving the test results) and evaluate the mAP.
python tools/test.py configs/votenet/votenet_8xb8_scannet-3d.py \ checkpoints/votenet_8x8_scannet-3d-18class_20200620_230238-2cea9c3a.pth
Test SECOND on KITTI with 8 GPUs, and evaluate the mAP.
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py \ checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth
Test PointPillars on nuScenes with 8 GPUs, and generate the json file to be submit to the official evaluation server.
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_8xb4-2x_nus-3d.py \ checkpoints/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth \ --cfg-options 'test_evaluator.jsonfile_prefix=./pointpillars_nuscenes_results'
The generated results be under
./pointpillars_nuscenes_results
directory.Test SECOND on KITTI with 8 GPUs, and generate the pkl files and submission data to be submit to the official evaluation server.
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/second/second_hv_secfpn_8xb6-80e_kitti-3d-3class.py \ checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-3class_20200620_230238-9208083a.pth \ --cfg-options 'test_evaluator.pklfile_prefix=./second_kitti_results' 'submission_prefix=./second_kitti_results'
The generated results be under
./second_kitti_results
directory.Test PointPillars on Lyft with 8 GPUs, generate the pkl files and make a submission to the leaderboard.
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d.py \ checkpoints/hv_pointpillars_fpn_sbn-2x8_2x_lyft-3d_latest.pth \ --cfg-options 'test_evaluator.jsonfile_prefix=results/pp_lyft/results_challenge' \ 'test_evaluator.csv_savepath=results/pp_lyft/results_challenge.csv' \ 'test_evaluator.pklfile_prefix=results/pp_lyft/results_challenge.pkl'
Notice: To generate submissions on Lyft,
csv_savepath
must be given in the--cfg-options
. After generating the csv file, you can make a submission with kaggle commands given on the website.Note that in the config of Lyft dataset, the value of
ann_file
keyword intest
is'lyft_infos_test.pkl'
, which is the official test set of Lyft without annotation. To test on the validation set, please change this to'lyft_infos_val.pkl'
.Test PointPillars on waymo with 8 GPUs, and evaluate the mAP with waymo metrics.
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py \ checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth \ --cfg-options 'test_evaluator.pklfile_prefix=results/waymo-car/kitti_results' \ 'test_evaluator.submission_prefix=results/waymo-car/kitti_results'
Notice: For evaluation on waymo, please follow the instruction to build the binary file
compute_detection_metrics_main
for metrics computation and put it intommdet3d/core/evaluation/waymo_utils/
.(Sometimes when using bazel to buildcompute_detection_metrics_main
, an error'round' is not a member of 'std'
may appear. We just need to remove thestd::
beforeround
in that file.)pklfile_prefix
should be given in the--eval-options
for the bin file generation. For metrics,waymo
is the recommended official evaluation prototype. Currently, evaluating with choicekitti
is adapted from KITTI and the results for each difficulty are not exactly the same as the definition of KITTI. Instead, most of objects are marked with difficulty 0 currently, which will be fixed in the future. The reasons of its instability include the large computation for evaluation, the lack of occlusion and truncation in the converted data, different definition of difficulty and different methods of computing average precision.Test PointPillars on waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard.
./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py \ checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth \ --cfg-options 'test_evaluator.pklfile_prefix=results/waymo-car/kitti_results' \ 'test_evaluator.submission_prefix=results/waymo-car/kitti_results'
Notice: After generating the bin file, you can simply build the binary file
create_submission
and use them to create a submission file by following the instruction. For evaluation on the validation set with the eval server, you can also use the same way to generate a submission.
Train predefined models on standard datasets¶
MMDetection3D implements distributed training and non-distributed training,
which uses MMDistributedDataParallel
and MMDataParallel
respectively.
All outputs (log files and checkpoints) will be saved to the working directory,
which is specified by work_dir
in the config file.
By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by adding the interval argument in the training config.
train_cfg = dict(type='EpochBasedTrainLoop', val_interval=1) # This evaluate the model per 12 epoch.
Important: The default learning rate in config files is for 8 GPUs and the exact batch size is marked by the config’s file name, e.g. ‘2xb8’ means 2 samples per GPU using 8 GPUs. According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. However, since most of the models in this repo use ADAM rather than SGD for optimization, the rule may not hold and users need to tune the learning rate by themselves.
Train with a single GPU¶
python tools/train.py ${CONFIG_FILE} [optional arguments]
If you want to specify the working directory in the command, you can add an argument --work-dir ${YOUR_WORK_DIR}
.
Training with CPU (experimental)¶
The process of training on the CPU is consistent with single GPU training. We just need to disable GPUs before the training process.
export CUDA_VISIBLE_DEVICES=-1
And then run the script of train with a single GPU.
Note:
For now, most of the point cloud related algorithms rely on 3D CUDA op, which can not be trained on CPU. Some monocular 3D object detection algorithms, like FCOS3D and SMOKE can be trained on CPU. We do not recommend users to use CPU for training because it is too slow. We support this feature to allow users to debug certain models on machines without GPU for convenience.
Train with multiple GPUs¶
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
Optional arguments are:
--cfg-options 'Key=value'
: Override some settings in the used config.
Train with multiple machines¶
If you run MMDetection3D on a cluster managed with slurm, you can use the script slurm_train.sh
. (This script also supports single machine training.)
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition.
GPUS=16 ./tools/slurm_train.sh dev pp_kitti_3class configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py /nfs/xxxx/pp_kitti_3class
You can check slurm_train.sh for full arguments and environment variables.
If you launch with multiple machines simply connected with ethernet, you can simply run following commands:
On the first machine:
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_train.sh $CONFIG $GPUS
On the second machine:
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR ./tools/dist_train.sh $CONFIG $GPUS
Usually it is slow if you do not have high speed networking like InfiniBand.
Launch multiple jobs on a single machine¶
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.
If you use dist_train.sh
to launch training jobs, you can set the port in commands.
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
If you use launch training jobs with Slurm, there are two ways to specify the ports.
Set the port through
--cfg-options
. This is more recommended since it does not change the original configs.CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --cfg-options 'env_cfg.dist_cfg.port=29500' CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --cfg-options 'env_cfg.dist_cfg.port=29501'
Modify the config files (usually the 6th line from the bottom in config files) to set different communication ports.
In
config1.py
,env_cfg = dict( dist_cfg=dict(backend='nccl', port=29500) )
In
config2.py
,env_cfg = dict( dist_cfg=dict(backend='nccl', port=29501) )
Then you can launch two jobs with
config1.py
andconfig2.py
.CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
Useful Tools¶
We provide lots of useful tools under tools/
directory.
Log Analysis¶
You can plot loss/mAP curves given a training log file. Run pip install seaborn
first to install the dependency.
python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}] [--mode ${MODE}] [--interval ${INTERVAL}]
Notice: If the metric you want to plot is calculated in the eval stage, you need to add the flag --mode eval
. If you perform evaluation with an interval of ${INTERVAL}
, you need to add the args --interval ${INTERVAL}
.
Examples:
Plot the classification loss of some run.
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
Plot the classification and regression loss of some run, and save the figure to a pdf.
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
Compare the bbox mAP of two runs in the same figure.
# evaluate PartA2 and second on KITTI according to Car_3D_moderate_strict python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/PartA2.log.json tools/logs/second.log.json --keys KITTI/Car_3D_moderate_strict --legend PartA2 second --mode eval --interval 1 # evaluate PointPillars for car and 3 classes on KITTI according to Car_3D_moderate_strict python tools/analysis_tools/analyze_logs.py plot_curve tools/logs/pp-3class.log.json tools/logs/pp.log.json --keys KITTI/Car_3D_moderate_strict --legend pp-3class pp --mode eval --interval 2
You can also compute the average training speed.
python tools/analysis_tools/analyze_logs.py cal_train_time log.json [--include-outliers]
The output is expected to be like the following.
-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
slowest epoch 11, average time is 1.2024
fastest epoch 1, average time is 1.1909
time std over epochs is 0.0028
average iter time: 1.1959 s/iter
Model Serving¶
Note: This tool is still experimental now, only SECOND is supported to be served with TorchServe
. We’ll support more models in the future.
In order to serve an MMDetection3D
model with TorchServe
, you can follow the steps:
1. Convert the model from MMDetection3D to TorchServe¶
python tools/deployment/mmdet3d2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
--output-folder ${MODEL_STORE} \
--model-name ${MODEL_NAME}
Note: ${MODEL_STORE} needs to be an absolute path to a folder.
2. Build mmdet3d-serve
docker image¶
docker build -t mmdet3d-serve:latest docker/serve/
3. Run mmdet3d-serve
¶
Check the official docs for running TorchServe with docker.
In order to run it on the GPU, you need to install nvidia-docker. You can omit the --gpus
argument in order to run on the CPU.
Example:
docker run --rm \
--cpus 8 \
--gpus device=0 \
-p8080:8080 -p8081:8081 -p8082:8082 \
--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
mmdet3d-serve:latest
Read the docs about the Inference (8080), Management (8081) and Metrics (8082) APis
4. Test deployment¶
You can use test_torchserver.py
to compare result of torchserver and pytorch.
python tools/deployment/test_torchserver.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${MODEL_NAME}
[--inference-addr ${INFERENCE_ADDR}] [--device ${DEVICE}] [--score-thr ${SCORE_THR}]
Example:
python tools/deployment/test_torchserver.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth second
Model Complexity¶
You can use tools/analysis_tools/get_flops.py
in MMDetection3D, a script adapted from flops-counter.pytorch, to compute the FLOPs and params of a given model.
python tools/analysis_tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
You will get the results like this.
==============================
Input shape: (40000, 4)
Flops: 5.78 GFLOPs
Params: 953.83 k
==============================
Note: This tool is still experimental and we do not guarantee that the number is absolutely correct. You may well use the result for simple comparisons, but double check it before you adopt it in technical reports or papers.
FLOPs are related to the input shape while parameters are not. The default input shape is (1, 40000, 4).
Some operators are not counted into FLOPs like GN and custom operators. Refer to
mmcv.cnn.get_model_complexity_info()
for details.We currently only support FLOPs calculation of single-stage models with single-modality input (point cloud or image). We will support two-stage and multi-modality models in the future.
Model Conversion¶
RegNet model to MMDetection¶
tools/model_converters/regnet2mmdet.py
convert keys in pycls pretrained RegNet models to
MMDetection style.
python tools/model_converters/regnet2mmdet.py ${SRC} ${DST} [-h]
Detectron ResNet to Pytorch¶
tools/detectron2pytorch.py
in MMDetection could convert keys in the original detectron pretrained
ResNet models to PyTorch style.
python tools/detectron2pytorch.py ${SRC} ${DST} ${DEPTH} [-h]
Prepare a model for publishing¶
tools/model_converters/publish_model.py
helps users to prepare their model for publishing.
Before you upload a model to AWS, you may want to
convert model weights to CPU tensors
delete the optimizer states and
compute the hash of the checkpoint file and append the hash id to the filename.
python tools/model_converters/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
E.g.,
python tools/model_converters/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth
The final output filename will be faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth
.
Dataset Conversion¶
tools/dataset_converters/
contains tools for converting datasets to other formats. Most of them convert datasets to pickle based info files, like kitti, nuscenes and lyft. Waymo converter is used to reorganize waymo raw data like KITTI style. Users could refer to them for our approach to converting data format. It is also convenient to modify them to use as scripts like nuImages converter.
To convert the nuImages dataset into COCO format, please use the command below:
python -u tools/dataset_converters/nuimage_converter.py --data-root ${DATA_ROOT} --version ${VERSIONS} \
--out-dir ${OUT_DIR} --nproc ${NUM_WORKERS} --extra-tag ${TAG}
--data-root
: the root of the dataset, defaults to./data/nuimages
.--version
: the version of the dataset, defaults tov1.0-mini
. To get the full dataset, please use--version v1.0-train v1.0-val v1.0-mini
--out-dir
: the output directory of annotations and semantic masks, defaults to./data/nuimages/annotations/
.--nproc
: number of workers for data preparation, defaults to4
. Larger number could reduce the preparation time as images are processed in parallel.--extra-tag
: extra tag of the annotations, defaults tonuimages
. This can be used to separate different annotations processed in different time for study.
More details could be referred to the doc for dataset preparation and README for nuImages dataset.
Visualization¶
MMDetection3D provides a Det3DLocalVisualizer
to visualize and store the state of the model during training and testing, as well as results, with the following features.
Support the basic drawing interface for multi-modality data and multi-task.
Support multiple backends such as local, TensorBoard, to write training status such as
loss
,lr
, or performance evaluation metrics and to a specified single or multiple backends.Support ground truth visualization on multimodal data, and cross-modal visualization of 3D detection results.
Basic Drawing Interface¶
Inherited from DetLocalVisualizer
, Det3DLocalVisualizer
provides an interface for drawing common objects on 2D images, such as drawing detection boxes, points, text, lines, circles, polygons, and binary masks. More details about 2D drawing can refer to the visualization documentation in MMDetection. Here we introduce the 3D drawing interface:
Drawing point cloud on the image¶
We support drawing point cloud on the image by using draw_points_on_image
.
import mmcv
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
info_file = load('demo/data/kitti/000008.pkl')
points = np.fromfile('demo/data/kitti/000008.bin', dtype=np.float32)
points = points.reshape(-1, 4)[:, :3]
lidar2img = np.array(info_file['data_list'][0]['images']['CAM2']['lidar2img'], dtype=np.float32)
visualizer = Det3DLocalVisualizer()
img = mmcv.imread('demo/data/kitti/000008.png')
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.set_image(img)
visualizer.draw_points_on_image(points, lidar2img)
visualizer.show()
Drawing 3D Boxes on Point Cloud¶
We support drawing 3D boxes on point cloud by using draw_bboxes_3d
.
import torch
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import LiDARInstance3DBoxes
points = np.fromfile('tests/data/kitti/training/velodyne/000000.bin', dtype=np.float32)
points = points.reshape(-1, 4)
visualizer = Det3DLocalVisualizer()
# set point cloud in visualizer
visualizer.set_points(points)
bboxes_3d = LiDARInstance3DBoxes(torch.tensor(
[[8.7314, -1.8559, -1.5997, 1.2000, 0.4800, 1.8900,
-1.5808]])),
# Draw 3D bboxes
visualizer.draw_bboxes_3d(bboxes_3d)
visualizer.show()
Drawing Projected 3D Boxes on Image¶
We support drawing projected 3D boxes on image by using draw_proj_bboxes_3d
.
import mmcv
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import CameraInstance3DBoxes
info_file = load('demo/data/kitti/000008.pkl')
cam2img = np.array(info_file['data_list'][0]['images']['CAM2']['cam2img'], dtype=np.float32)
bboxes_3d = []
for instance in info_file['data_list'][0]['instances']:
bboxes_3d.append(instance['bbox_3d'])
gt_bboxes_3d = np.array(bboxes_3d, dtype=np.float32)
gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d)
input_meta = {'cam2img': cam2img}
visualizer = Det3DLocalVisualizer()
img = mmcv.imread('demo/data/kitti/000008.png')
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.set_image(img)
# project 3D bboxes to image
visualizer.draw_proj_bboxes_3d(gt_bboxes_3d, input_meta)
visualizer.show()
Drawing BEV Boxes¶
We support drawing BEV boxes by using draw_bev_bboxes
.
import numpy as np
from mmengine import load
from mmdet3d.visualization import Det3DLocalVisualizer
from mmdet3d.structures import CameraInstance3DBoxes
info_file = load('demo/data/kitti/000008.pkl')
bboxes_3d = []
for instance in info_file['data_list'][0]['instances']:
bboxes_3d.append(instance['bbox_3d'])
gt_bboxes_3d = np.array(bboxes_3d, dtype=np.float32)
gt_bboxes_3d = CameraInstance3DBoxes(gt_bboxes_3d)
visualizer = Det3DLocalVisualizer()
# set bev image in visualizer
visualizer.set_bev_image()
# draw bev bboxes
visualizer.draw_bev_bboxes(gt_bboxes_3d, edge_colors='orange')
visualizer.show()

Drawing 3D Semantic Mask¶
We support draw segmentation mask via per-point colorization by using draw_seg_mask
.
import torch
from mmdet3d.visualization import Det3DLocalVisualizer
points = np.fromfile('tests/data/s3dis/points/Area_1_office_2.bin', dtype=np.float32)
points = points.reshape(-1, 3)
visualizer = Det3DLocalVisualizer()
mask = np.random.rand(points.shape[0], 3)
points_with_mask = np.concatenate((points, mask), axis=-1)
# Draw 3D points with mask
visualizer.draw_seg_mask(points_with_mask)
visualizer.show()
Results¶
To see the prediction results of trained models, you can run the following command:
python tools/test.py ${CONFIG_FILE} ${CKPT_PATH} --show --show-dir ${SHOW_DIR}
After running this command, plotted results including input data and the output of networks visualized on the input will be saved in ${SHOW_DIR}
.
After running this command, you will obtain the input data, the output of networks and ground-truth labels visualized on the input (e.g. ***_gt.png
and ***_pred.png
in multi-modality detection task and vision-based detection task) in ${SHOW_DIR}
. When show
is enabled, Open3D will be used to visualize the results online. If you are running test in remote server without GUI, the online visualization is not supported. You can download the results.pkl
from the remote server, and visualize the prediction results offline in your local machine.
To visualize the results with Open3D
backend offline, you can run the following command:
python tools/misc/visualize_results.py ${CONFIG_FILE} --result ${RESULTS_PATH} --show-dir ${SHOW_DIR}
This allows the inference and results generation to be done in remote server and the users can open them on their host with GUI.
Dataset¶
We also provide scripts to visualize the dataset without inference. You can use tools/misc/browse_dataset.py
to show loaded data and ground-truth online and save them on the disk. Currently we support single-modality 3D detection and 3D segmentation on all the datasets, multi-modality 3D detection on KITTI and SUN RGB-D, as well as monocular 3D detection on nuScenes. To browse the KITTI dataset, you can run the following command:
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task det --output-dir ${OUTPUT_DIR}
Notice: Once specifying --output-dir
, the images of views specified by users will be saved when pressing _ESC_
in open3d window.
To verify the data consistency and the effect of data augmentation, you can also add --aug
flag to visualize the data after data augmentation using the command as below:
python tools/misc/browse_dataset.py configs/_base_/datasets/kitti-3d-3class.py --task lidar_det --aug --output-dir ${OUTPUT_DIR}
If you also want to show 2D images with 3D bounding boxes projected onto them, you need to find a config that supports multi-modality data loading, and then change the --task
args to multi-modality_det
. An example is showed below:
python tools/misc/browse_dataset.py configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py --task multi-modality_det --output-dir ${OUTPUT_DIR}
You can simply browse different datasets using different configs, e.g. visualizing the ScanNet dataset in 3D semantic segmentation task:
python tools/misc/browse_dataset.py configs/_base_/datasets/scannet_seg-3d-20class.py --task lidar_seg --output-dir ${OUTPUT_DIR} --online
And browsing the nuScenes dataset in monocular 3D detection task:
python tools/misc/browse_dataset.py configs/_base_/datasets/nus-mono3d.py --task mono_det --output-dir ${OUTPUT_DIR} --online
Datasets¶
KITTI Dataset for 3D Object Detection¶
This page provides specific tutorials about the usage of MMDetection3D for KITTI dataset.
Prepare dataset¶
You can download KITTI 3D detection data HERE and unzip all zip files. Besides, the road planes could be downloaded from HERE, which are optional for data augmentation during training for better performance. The road planes are generated by AVOD, you can see more details HERE.
Like the general way to prepare dataset, it is recommended to symlink the dataset root to $MMDETECTION3D/data
.
The folder structure should be organized as follows before our processing.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ │ ├── planes (optional)
Create KITTI dataset¶
To create KITTI point cloud data, we load the raw point cloud data and generate the relevant annotations including object labels and bounding boxes. We also generate all single training objects’ point cloud in KITTI dataset and save them as .bin
files in data/kitti/kitti_gt_database
. Meanwhile, .pkl
info files are also generated for training or validation. Subsequently, create KITTI data by running:
mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets
# Download data split
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti --with-plane
Note that if your local disk does not have enough space for saving converted data, you can change the --out-dir
to anywhere else, and you need to remove the --with-plane
flag if planes
are not prepared.
The folder structure after processing should be as below
kitti
├── ImageSets
│ ├── test.txt
│ ├── train.txt
│ ├── trainval.txt
│ ├── val.txt
├── testing
│ ├── calib
│ ├── image_2
│ ├── velodyne
│ ├── velodyne_reduced
├── training
│ ├── calib
│ ├── image_2
│ ├── label_2
│ ├── velodyne
│ ├── velodyne_reduced
│ ├── planes (optional)
├── kitti_gt_database
│ ├── xxxxx.bin
├── kitti_infos_train.pkl
├── kitti_infos_val.pkl
├── kitti_dbinfos_train.pkl
├── kitti_infos_test.pkl
├── kitti_infos_trainval.pkl
kitti_gt_database/xxxxx.bin
: point cloud data included in each 3D bounding box of the training dataset.kitti_infos_train.pkl
: training dataset, a dict contains two keys:metainfo
anddata_list
.metainfo
contains the basic information for the dataset itself, such ascategories
,dataset
andinfo_version
, whiledata_list
is a list of dict, each dict (hereinafter referred to asinfo
) contains all the detailed information of single sample as follows:info[‘sample_idx’]: The index of this sample in the whole dataset.
info[‘images’]: Information of images captured by multiple cameras. A dict contains five keys including:
CAM0
,CAM1
,CAM2
,CAM3
,R0_rect
.info[‘images’][‘R0_rect’]: Rectifying rotation matrix with shape (4, 4).
info[‘images’][‘CAM2’]: Include some information about the
CAM2
camera sensor.info[‘images’][‘CAM2’][‘img_path’]: The filename of the image.
info[‘images’][‘CAM2’][‘height’]: The height of the image.
info[‘images’][‘CAM2’][‘width’]: The width of the image.
info[‘images’][‘CAM2’][‘cam2img’]: Transformation matrix from camera to image with shape (4, 4).
info[‘images’][‘CAM2’][‘lidar2cam’]: Transformation matrix from lidar to camera with shape (4, 4).
info[‘images’][‘CAM2’][‘lidar2img’]: Transformation matrix from lidar to image with shape (4, 4).
info[‘lidar_points’]: A dict containing all the information related to the lidar points.
info[‘lidar_points’][‘lidar_path’]: The filename of the lidar point cloud data.
info[‘lidar_points’][‘num_pts_feats’]: The feature dimension of point.
info[‘lidar_points’][‘Tr_velo_to_cam’]: Transformation from Velodyne coordinate to camera coordinate with shape (4, 4).
info[‘lidar_points’][‘Tr_imu_to_velo’]: Transformation from IMU coordinate to Velodyne coordinate with shape (4, 4).
info[‘instances’]: It is a list of dict. Each dict contains all annotation information of single instance. For the i-th instance:
info[‘instances’][i][‘bbox’]: List of 4 numbers representing the 2D bounding box of the instance, in (x1, y1, x2, y2) order.
info[‘instances’][i][‘bbox_3d’]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, h, w, yaw) order.
info[‘instances’][i][‘bbox_label’]: An int indicate the 2D label of instance and the -1 indicating ignore.
info[‘instances’][i][‘bbox_label_3d’]: An int indicate the 3D label of instance and the -1 indicating ignore.
info[‘instances’][i][‘depth’]: Projected center depth of the 3D bounding box with respect to the image plane.
info[‘instances’][i][‘num_lidar_pts’]: The number of LiDAR points in the 3D bounding box.
info[‘instances’][i][‘center_2d’]: Projected 2D center of the 3D bounding box.
info[‘instances’][i][‘difficulty’]: KITTI difficulty: ‘Easy’, ‘Moderate’, ‘Hard’.
info[‘instances’][i][‘truncated’]: Float from 0 (non-truncated) to 1 (truncated), where truncated refers to the object leaving image boundaries.
info[‘instances’][i][‘occluded’]: Integer (0,1,2,3) indicating occlusion state: 0 = fully visible, 1 = partly occluded, 2 = largely occluded, 3 = unknown.
info[‘instances’][i][‘group_ids’]: Used for multi-part object.
info[‘plane’](optional): Road level information.
Please refer to kitti_converter.py and update_infos_to_v2.py for more details.
Train pipeline¶
A typical train pipeline of 3D detection on KITTI is as below:
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4, # x, y, z, intensity
use_dim=4),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(type='ObjectSample', db_sampler=db_sampler),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[1.0, 1.0, 0.5],
global_rot_range=[0.0, 0.0],
rot_range=[-0.78539816, 0.78539816]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
Data augmentation:
ObjectNoise
: apply noise to each GT objects in the scene.RandomFlip3D
: randomly flip input point cloud horizontally or vertically.GlobalRotScaleTrans
: rotate input point cloud.
Evaluation¶
An example to evaluate PointPillars with 8 GPUs with kitti metrics is as follows:
bash tools/dist_test.sh configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py work_dirs/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class/latest.pth 8
Metrics¶
KITTI evaluates 3D object detection performance using mean Average Precision (mAP) and Average Orientation Similarity (AOS), Please refer to its official website and original paper for more details.
We also adopt this approach for evaluation on KITTI. An example of printed evaluation results is as follows:
Car AP@0.70, 0.70, 0.70:
bbox AP:97.9252, 89.6183, 88.1564
bev AP:90.4196, 87.9491, 85.1700
3d AP:88.3891, 77.1624, 74.4654
aos AP:97.70, 89.11, 87.38
Car AP@0.70, 0.50, 0.50:
bbox AP:97.9252, 89.6183, 88.1564
bev AP:98.3509, 90.2042, 89.6102
3d AP:98.2800, 90.1480, 89.4736
aos AP:97.70, 89.11, 87.38
Testing and make a submission¶
An example to test PointPillars on KITTI with 8 GPUs and generate a submission to the leaderboard is as follows:
First, you need to modify the
test_dataloader
andtest_evaluator
dict in your config file, just like:data_root = 'data/kitti/' test_dataloader = dict( dataset=dict( ann_file='kitti_infos_test.pkl', load_eval_anns=False, data_prefix=dict(pts='testing/velodyne_reduced'))) test_evaluator = dict( ann_file=data_root + 'kitti_infos_test.pkl', format_only=True, pklfile_prefix='results/kitti-3class/kitti_results', submission_prefix='results/kitti-3class/kitti_results')
And then, you can run the test script.
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py work_dirs/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class/latest.pth 8
After generating results/kitti-3class/kitti_results/xxxxx.txt
files, you can submit these files to KITTI benchmark. Please refer to the KITTI official website for more details.
NuScenes Dataset for 3D Object Detection¶
This page provides specific tutorials about the usage of MMDetection3D for nuScenes dataset.
Before Preparation¶
You can download nuScenes 3D detection data HERE and unzip all zip files.
Like the general way to prepare dataset, it is recommended to symlink the dataset root to $MMDETECTION3D/data
.
The folder structure should be organized as follows before our processing.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
Dataset Preparation¶
We typically need to organize the useful data information with a .pkl
file in a specific style.
To prepare these files for nuScenes, run the following command:
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
The folder structure after processing should be as below.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ │ ├── nuscenes_database
│ │ ├── nuscenes_infos_train.pkl
│ │ ├── nuscenes_infos_val.pkl
│ │ ├── nuscenes_infos_test.pkl
│ │ ├── nuscenes_dbinfos_train.pkl
nuscenes_database/xxxxx.bin
: point cloud data included in each 3D bounding box of the training datasetnuscenes_infos_train.pkl
: training dataset, a dict contains two keys:metainfo
anddata_list
.metainfo
contains the basic information for the dataset itself, such ascategories
,dataset
andinfo_version
, whiledata_list
is a list of dict, each dict (hereinafter referred to asinfo
) contains all the detailed information of single sample as follows:info[‘sample_idx’]: The index of this sample in the whole dataset.
info[‘token’]: Sample data token.
info[‘timestamp’]: Timestamp of the sample data.
info[‘lidar_points’]: A dict containing all the information related to the lidar points.
info[‘lidar_points’][‘lidar_path’]: The filename of the lidar point cloud data.
info[‘lidar_points’][‘num_pts_feats’]: The feature dimension of point.
info[‘lidar_points’][‘lidar2ego’]: The transformation matrix from this lidar sensor to ego vehicle. (4x4 list)
info[‘lidar_points’][‘ego2global’]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
info[‘lidar_sweeps’]: A list contains sweeps information (The intermediate lidar frames without annotations)
info[‘lidar_sweeps’][i][‘lidar_points’][‘data_path’]: The lidar data path of i-th sweep.
info[‘lidar_sweeps’][i][‘lidar_points’][‘lidar2ego’]: The transformation matrix from this lidar sensor to ego vehicle. (4x4 list)
info[‘lidar_sweeps’][i][‘lidar_points’][‘ego2global’]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
info[‘lidar_sweeps’][i][‘lidar2sensor’]: The transformation matrix from the main lidar sensor to the current sensor (for collecting the sweep data). (4x4 list)
info[‘lidar_sweeps’][i][‘timestamp’]: Timestamp of the sweep data.
info[‘lidar_sweeps’][i][‘sample_data_token’]: The sweep sample data token.
info[‘images’]: A dict contains six keys corresponding to each camera:
'CAM_FRONT'
,'CAM_FRONT_RIGHT'
,'CAM_FRONT_LEFT'
,'CAM_BACK'
,'CAM_BACK_LEFT'
,'CAM_BACK_RIGHT'
. Each dict contains all data information related to corresponding camera.info[‘images’][‘CAM_XXX’][‘img_path’]: The filename of the image.
info[‘images’][‘CAM_XXX’][‘cam2img’]: The transformation matrix recording the intrinsic parameters when projecting 3D points to each image plane. (3x3 list)
info[‘images’][‘CAM_XXX’][‘sample_data_token’]: Sample data token of image.
info[‘images’][‘CAM_XXX’][‘timestamp’]: Timestamp of the image.
info[‘images’][‘CAM_XXX’][‘cam2ego’]: The transformation matrix from this camera sensor to ego vehicle. (4x4 list)
info[‘images’][‘CAM_XXX’][‘lidar2cam’]: The transformation matrix from lidar sensor to this camera. (4x4 list)
info[‘instances’]: It is a list of dict. Each dict contains all annotation information of single instance. For the i-th instance:
info[‘instances’][i][‘bbox_3d’]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, w, h, yaw) order.
info[‘instances’][i][‘bbox_label_3d’]: A int indicate the label of instance and the -1 indicate ignore.
info[‘instances’][i][‘velocity’]: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), a list has shape (2.).
info[‘instances’][i][‘num_lidar_pts’]: Number of lidar points included in each 3D bounding box.
info[‘instances’][i][‘num_radar_pts’]: Number of radar points included in each 3D bounding box.
info[‘instances’][i][‘bbox_3d_isvalid’]: Whether each bounding box is valid. In general, we only take the 3D boxes that include at least one lidar or radar point as valid boxes.
info[‘cam_instances’]: It is a dict containing keys
'CAM_FRONT'
,'CAM_FRONT_RIGHT'
,'CAM_FRONT_LEFT'
,'CAM_BACK'
,'CAM_BACK_LEFT'
,'CAM_BACK_RIGHT'
. For vision-based 3D object detection task, we split 3D annotations of the whole scenes according to the camera they belong to. For the i-th instance:info[‘cam_instances’][‘CAM_XXX’][i][‘bbox_label’]: Label of instance.
info[‘cam_instances’][‘CAM_XXX’][i][‘bbox_label_3d’]: Label of instance.
info[‘cam_instances’][‘CAM_XXX’][i][‘bbox’]: 2D bounding box annotation (exterior rectangle of the projected 3D box), a list arrange as [x1, y1, x2, y2].
info[‘cam_instances’][‘CAM_XXX’][i][‘center_2d’]: Projected center location on the image, a list has shape (2,), .
info[‘cam_instances’][‘CAM_XXX’][i][‘depth’]: The depth of projected center.
info[‘cam_instances’][‘CAM_XXX’][i][‘velocity’]: Velocities of 3D bounding boxes (no vertical measurements due to inaccuracy), a list has shape (2,).
info[‘cam_instances’][‘CAM_XXX’][i][‘attr_label’]: The attr label of instance. We maintain a default attribute collection and mapping for attribute classification.
info[‘cam_instances’][‘CAM_XXX’][i][‘bbox_3d’]: List of 7 numbers representing the 3D bounding box of the instance, in (x, y, z, l, h, w, yaw) order.
Note:
The differences between
bbox_3d
ininstances
and that incam_instances
. Bothbbox_3d
have been converted to MMDet3D coordinate system, butbboxes_3d
ininstances
is in LiDAR coordinate format andbboxes_3d
incam_instances
is in Camera coordinate format. Mind the difference between them in 3D Box representation (‘l, w, h’ and ‘l, h, w’).Here we only explain the data recorded in the training info files. The same applies to validation and testing set (the
.pkl
file of test set does not containsinstances
andcam_instances
).
The core function to get nuscenes_infos_xxx.pkl
is _fill_trainval_infos.
Please refer to nuscenes_converter.py for more details.
Training pipeline¶
LiDAR-Based Methods¶
A typical training pipeline of LiDAR-based 3D detection (including multi-modality methods) on nuScenes is as below.
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=5,
use_dim=5),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
Compared to general cases, nuScenes has a specific 'LoadPointsFromMultiSweeps'
pipeline to load point clouds from consecutive frames. This is a common practice used in this setting.
Please refer to the nuScenes original paper for more details.
The default use_dim
in 'LoadPointsFromMultiSweeps'
is [0, 1, 2, 4]
, where the first 3 dimensions refer to point coordinates and the last refers to timestamp differences.
Intensity is not used by default due to its yielded noise when concatenating the points from different frames.
Vision-Based Methods¶
A typical training pipeline of image-based 3D detection on nuScenes is as below.
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=True,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='mmdet.Resize', scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='Pack3DDetInputs',
keys=[
'img', 'gt_bboxes', 'gt_bboxes_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers_2d', 'depths'
]),
]
It follows the general pipeline of 2D detection while differs in some details:
It uses monocular pipelines to load images, which includes additional required information like camera intrinsics.
It needs to load 3D annotations.
Some data augmentation techniques need to be adjusted, such as
RandomFlip3D
. Currently we do not support more augmentation methods, because how to transfer and apply other techniques is still under explored.
Evaluation¶
An example to evaluate PointPillars with 8 GPUs with nuScenes metrics is as follows.
bash ./tools/dist_test.sh configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb4-2x_nus-3d.py checkpoints/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_20200620_230405-2fa62f3d.pth 8
Metrics¶
NuScenes proposes a comprehensive metric, namely nuScenes detection score (NDS), to evaluate different methods and set up the benchmark. It consists of mean Average Precision (mAP), Average Translation Error (ATE), Average Scale Error (ASE), Average Orientation Error (AOE), Average Velocity Error (AVE) and Average Attribute Error (AAE). Please refer to its official website for more details.
We also adopt this approach for evaluation on nuScenes. An example of printed evaluation results is as follows:
mAP: 0.3197
mATE: 0.7595
mASE: 0.2700
mAOE: 0.4918
mAVE: 1.3307
mAAE: 0.1724
NDS: 0.3905
Eval time: 170.8s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.503 0.577 0.152 0.111 2.096 0.136
truck 0.223 0.857 0.224 0.220 1.389 0.179
bus 0.294 0.855 0.204 0.190 2.689 0.283
trailer 0.081 1.094 0.243 0.553 0.742 0.167
construction_vehicle 0.058 1.017 0.450 1.019 0.137 0.341
pedestrian 0.392 0.687 0.284 0.694 0.876 0.158
motorcycle 0.317 0.737 0.265 0.580 2.033 0.104
bicycle 0.308 0.704 0.299 0.892 0.683 0.010
traffic_cone 0.555 0.486 0.309 nan nan nan
barrier 0.466 0.581 0.269 0.169 nan nan
Testing and make a submission¶
An example to test PointPillars on nuScenes with 8 GPUs and generate a submission to the leaderboard is as follows.
You should modify the jsonfile_prefix
in the test_evaluator
of corresponding configuration. For example, adding test_evaluator = dict(type='NuScenesMetric', jsonfile_prefix='work_dirs/pp-nus/results_eval.json')
or using --cfg-options "test_evaluator.jsonfile_prefix=work_dirs/pp-nus/results_eval.json)
after the test command.
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb4-2x_nus-3d.py work_dirs/pp-nus/latest.pth 8 --cfg-options 'test_evaluator.jsonfile_prefix=work_dirs/pp-nus/results_eval'
Note that the testing info should be changed to that for testing set instead of validation set here.
After generating the work_dirs/pp-nus/results_eval.json
, you can compress it and submit it to nuScenes benchmark. Please refer to the nuScenes official website for more information.
We can also visualize the prediction results with our developed visualization tools. Please refer to the visualization doc for more details.
Notes¶
Transformation between NuScenesBox
and our CameraInstanceBoxes
.¶
In general, the main difference of NuScenesBox
and our CameraInstanceBoxes
is mainly reflected in the yaw definition. NuScenesBox
defines the rotation with a quaternion or three Euler angles while ours only defines one yaw angle due to the practical scenario. It requires us to add some additional rotations manually in the pre-processing and post-processing, such as here.
In addition, please note that the definition of corners and locations are detached in the NuScenesBox
. For example, in monocular 3D detection, the definition of the box location is in its camera coordinate (see its official illustration for car setup), which is consistent with ours. In contrast, its corners are defined with the convention “x points forward, y to the left, z up”. It results in different philosophy of dimension and rotation definitions from our CameraInstanceBoxes
. An example to remove similar hacks is PR #744. The same problem also exists in the LiDAR system. To deal with them, we typically add some transformation in the pre-processing and post-processing to guarantee the box will be in our coordinate system during the entire training and inference procedure.
Lyft Dataset for 3D Object Detection¶
This page provides specific tutorials about the usage of MMDetection3D for Lyft dataset.
Before Preparation¶
You can download Lyft 3D detection data HERE and unzip all zip files.
Like the general way to prepare a dataset, it is recommended to symlink the dataset root to $MMDETECTION3D/data
.
The folder structure should be organized as follows before our processing.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── lyft
│ │ ├── v1.01-train
│ │ │ ├── v1.01-train (train_data)
│ │ │ ├── lidar (train_lidar)
│ │ │ ├── images (train_images)
│ │ │ ├── maps (train_maps)
│ │ ├── v1.01-test
│ │ │ ├── v1.01-test (test_data)
│ │ │ ├── lidar (test_lidar)
│ │ │ ├── images (test_images)
│ │ │ ├── maps (test_maps)
│ │ ├── train.txt
│ │ ├── val.txt
│ │ ├── test.txt
│ │ ├── sample_submission.csv
Here v1.01-train
and v1.01-test
contain the metafiles which are similar to those of nuScenes. .txt
files contain the data split information.
Lyft does not have an official split for training and validation set, so we provide a split considering the number of objects from different categories in different scenes.
sample_submission.csv
is the base file for submission on the Kaggle evaluation server.
Note that we follow the original folder names for clear organization. Please rename the raw folders as shown above.
Dataset Preparation¶
The way to organize Lyft dataset is similar to nuScenes. We also generate the .pkl
files which share almost the same structure.
Next, we will mainly focus on the difference between these two datasets. For a more detailed explanation of the info structure, please refer to nuScenes tutorial.
To prepare info files for Lyft, run the following commands:
python tools/create_data.py lyft --root-path ./data/lyft --out-dir ./data/lyft --extra-tag lyft --version v1.01
python tools/dataset_converters/lyft_data_fixer.py --version v1.01 --root-folder ./data/lyft
Note that the second command serves the purpose of fixing a corrupted lidar data file. Please refer to the discussion here for more details.
The folder structure after processing should be as below.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── lyft
│ │ ├── v1.01-train
│ │ │ ├── v1.01-train (train_data)
│ │ │ ├── lidar (train_lidar)
│ │ │ ├── images (train_images)
│ │ │ ├── maps (train_maps)
│ │ ├── v1.01-test
│ │ │ ├── v1.01-test (test_data)
│ │ │ ├── lidar (test_lidar)
│ │ │ ├── images (test_images)
│ │ │ ├── maps (test_maps)
│ │ ├── train.txt
│ │ ├── val.txt
│ │ ├── test.txt
│ │ ├── sample_submission.csv
│ │ ├── lyft_infos_train.pkl
│ │ ├── lyft_infos_val.pkl
│ │ ├── lyft_infos_test.pkl
lyft_infos_train.pkl
: training dataset, a dict contains two keys:metainfo
anddata_list
.metainfo
contains the basic information for the dataset itself, such ascategories
,dataset
andinfo_version
, whiledata_list
is a list of dict, each dict (hereinafter referred to asinfo
) contains all the detailed information of single sample as follows:info[‘sample_idx’]: The index of this sample in the whole dataset.
info[‘token’]: Sample data token.
info[‘timestamp’]: Timestamp of the sample data.
info[‘lidar_points’]: A dict containing all the information related to the lidar points.
info[‘lidar_points’][‘lidar_path’]: The filename of the lidar point cloud data.
info[‘lidar_points’][‘num_pts_feats’]: The feature dimension of point.
info[‘lidar_points’][‘lidar2ego’]: The transformation matrix from this lidar sensor to ego vehicle. (4x4 list)
info[‘lidar_points’][‘ego2global’]: The transformation matrix from the ego vehicle to global coordinates. (4x4 list)
info[‘lidar_sweeps’]: A list contains sweeps information (The intermediate lidar frames without annotations).
info[‘lidar_sweeps’][i][‘lidar_points’][‘data_path’]: The lidar data path of i-th sweep.
info[‘lidar_sweeps’][i][‘lidar_points’][‘lidar2ego’]: The transformation matrix from this lidar sensor to ego vehicle in i-th sweep timestamp
info[‘lidar_sweeps’][i][‘lidar_points’][‘ego2global’]: The transformation matrix from the ego vehicle in i-th sweep timestamp to global coordinates. (4x4 list)
info[‘lidar_sweeps’][i][‘lidar2sensor’]: The transformation matrix from the keyframe lidar to the i-th frame lidar. (4x4 list)
info[‘lidar_sweeps’][i][‘timestamp’]: Timestamp of the sweep data.
info[‘lidar_sweeps’][i][‘sample_data_token’]: The sweep sample data token.
info[‘images’]: A dict contains six keys corresponding to each camera:
'CAM_FRONT'
,'CAM_FRONT_RIGHT'
,'CAM_FRONT_LEFT'
,'CAM_BACK'
,'CAM_BACK_LEFT'
,'CAM_BACK_RIGHT'
. Each dict contains all data information related to corresponding camera.info[‘images’][‘CAM_XXX’][‘img_path’]: The filename of the image.
info[‘images’][‘CAM_XXX’][‘cam2img’]: The transformation matrix recording the intrinsic parameters when projecting 3D points to each image plane. (3x3 list)
info[‘images’][‘CAM_XXX’][‘sample_data_token’]: Sample data token of image.
info[‘images’][‘CAM_XXX’][‘timestamp’]: Timestamp of the image.
info[‘images’][‘CAM_XXX’][‘cam2ego’]: The transformation matrix from this camera sensor to ego vehicle. (4x4 list)
info[‘images’][‘CAM_XXX’][‘lidar2cam’]: The transformation matrix from lidar sensor to this camera. (4x4 list)
info[‘instances’]: It is a list of dict. Each dict contains all annotation information of single instance. For the i-th instance:
info[‘instances’][i][‘bbox_3d’]: List of 7 numbers representing the 3D bounding box in lidar coordinate system of the instance, in (x, y, z, l, w, h, yaw) order.
info[‘instances’][i][‘bbox_label_3d’]: A int starting from 0 indicates the label of instance, while the -1 indicates ignore class.
info[‘instances’][i][‘bbox_3d_isvalid’]: Whether each bounding box is valid. In general, we only take the 3D boxes that include at least one lidar or radar point as valid boxes.
Next, we will elaborate on the difference compared to nuScenes in terms of the details recorded in these info files.
Without
lyft_database/xxxxx.bin
: This folder and.bin
files are not extracted on the Lyft dataset due to the negligible effect of ground-truth sampling in the experiments.lyft_infos_train.pkl
:Without info[‘instances’][i][‘velocity’]: There is no velocity measurement on Lyft.
Without info[‘instances’][i][‘num_lidar_pts’] and info[‘instances’][‘num_radar_pts’]
Here we only explain the data recorded in the training info files. The same applies to the validation set and test set (without instances).
Please refer to lyft_converter.py for more details about the structure of lyft_infos_xxx.pkl
.
Training pipeline¶
LiDAR-Based Methods¶
A typical training pipeline of LiDAR-based 3D detection (including multi-modality methods) on Lyft is almost the same as nuScenes as below.
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=5,
use_dim=5),
dict(
type='LoadPointsFromMultiSweeps',
sweeps_num=10),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
Similar to nuScenes, models on Lyft also need the 'LoadPointsFromMultiSweeps'
pipeline to load point clouds from consecutive frames.
In addition, considering the intensity of LiDAR points collected by Lyft is invalid, we also set the use_dim
in 'LoadPointsFromMultiSweeps'
to [0, 1, 2, 4]
by default,
where the first 3 dimensions refer to point coordinates, and the last refers to timestamp differences.
Evaluation¶
An example to evaluate PointPillars with 8 GPUs with Lyft metrics is as follows:
bash ./tools/dist_test.sh configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb2-2x_lyft-3d.py checkpoints/hv_pointpillars_fpn_sbn-all_2x8_2x_lyft-3d_20210517_202818-fc6904c3.pth 8
Metrics¶
Lyft proposes a more strict metric for evaluating the predicted 3D bounding boxes. The basic criteria to judge whether a predicted box is positive or not is the same as KITTI, i.e. the 3D Intersection over Union (IoU). However, it adopts a way similar to COCO to compute the mean average precision (mAP) – compute the average precision under different thresholds of 3D IoU from 0.5-0.95. Actually, overlap more than 0.7 3D IoU is a quite strict criterion for 3D detection methods, so the overall performance seems a little low. The imbalance of annotations for different categories is another important reason for the finally lower results compared to other datasets. Please refer to its official website for more details about the definition of this metric.
We employ this official method for evaluation on Lyft. An example of printed evaluation results is as follows:
+mAPs@0.5:0.95------+--------------+
| class | mAP@0.5:0.95 |
+-------------------+--------------+
| animal | 0.0 |
| bicycle | 0.099 |
| bus | 0.177 |
| car | 0.422 |
| emergency_vehicle | 0.0 |
| motorcycle | 0.049 |
| other_vehicle | 0.359 |
| pedestrian | 0.066 |
| truck | 0.176 |
| Overall | 0.15 |
+-------------------+--------------+
Testing and make a submission¶
An example to test PointPillars on Lyft with 8 GPUs and generate a submission to the leaderboard is as follows.
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_fpn_sbn-all_8xb2-2x_lyft-3d.py work_dirs/pp-lyft/latest.pth 8 --cfg-options test_evaluator.jsonfile_prefix=work_dirs/pp-lyft/results_challenge test_evaluator.csv_savepath=results/pp-lyft/results_challenge.csv
After generating the work_dirs/pp-lyft/results_challenge.csv
, you can submit it to the Kaggle evaluation server. Please refer to the official website for more information.
We can also visualize the prediction results with our developed visualization tools. Please refer to the visualization doc for more details.
Waymo Dataset¶
This page provides specific tutorials about the usage of MMDetection3D for Waymo dataset.
Prepare dataset¶
Before preparing Waymo dataset, if you only installed requirements in requirements/build.txt
and requirements/runtime.txt
before, please install the official package for this dataset at first by running
# tf 2.1.0.
pip install waymo-open-dataset-tf-2-1-0==1.2.0
# tf 2.0.0
# pip install waymo-open-dataset-tf-2-0-0==1.2.0
# tf 1.15.0
# pip install waymo-open-dataset-tf-1-15-0==1.2.0
or
pip install -r requirements/optional.txt
Like the general way to prepare dataset, it is recommended to symlink the dataset root to $MMDETECTION3D/data
.
Due to the original Waymo data format is based on tfrecord
, we need to preprocess the raw data for convenient usage in the training and evaluation procedure. Our approach is to convert them into KITTI format.
The folder structure should be organized as follows before our processing.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
You can download Waymo open dataset V1.2 HERE and its data split HERE. Then put tfrecord
files into corresponding folders in data/waymo/waymo_format/
and put the data split txt files into data/waymo/kitti_format/ImageSets
. Download ground truth bin files for validation set HERE and put it into data/waymo/waymo_format/
. A tip is that you can use gsutil
to download the large-scale dataset with commands. You can take this tool as an example for more details. Subsequently, prepare Waymo data by running
python tools/create_data.py waymo --root-path ./data/waymo/ --out-dir ./data/waymo/ --workers 128 --extra-tag waymo
Note that if your local disk does not have enough space for saving converted data, you can change the --out-dir
to anywhere else. Just remember to create folders and prepare data there in advance and link them back to data/waymo/kitti_format
after the data conversion.
After the data conversion, the folder structure and info files should be organized as below.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── waymo
│ │ ├── waymo_format
│ │ │ ├── training
│ │ │ ├── validation
│ │ │ ├── testing
│ │ │ ├── gt.bin
│ │ ├── kitti_format
│ │ │ ├── ImageSets
│ │ │ ├── training
│ │ │ │ ├── calib
│ │ │ │ ├── image_0
│ │ │ │ ├── image_1
│ │ │ │ ├── image_2
│ │ │ │ ├── image_3
│ │ │ │ ├── image_4
│ │ │ │ ├── label_0
│ │ │ │ ├── label_1
│ │ │ │ ├── label_2
│ │ │ │ ├── label_3
│ │ │ │ ├── label_4
│ │ │ │ ├── label_all
│ │ │ │ ├── pose
│ │ │ │ ├── velodyne
│ │ │ ├── testing
│ │ │ │ ├── (the same as training)
│ │ │ ├── waymo_gt_database
│ │ │ ├── waymo_infos_trainval.pkl
│ │ │ ├── waymo_infos_train.pkl
│ │ │ ├── waymo_infos_val.pkl
│ │ │ ├── waymo_infos_test.pkl
│ │ │ ├── waymo_dbinfos_train.pkl
Here because there are several cameras, we store the corresponding image and labels that can be projected to that camera respectively and save pose for further usage of consecutive frames point clouds. We use a coding way {a}{bbb}{ccc}
to name the data for each frame, where a
is the prefix for different split (0
for training, 1
for validation and 2
for testing), bbb
for segment index and ccc
for frame index. You can easily locate the required frame according to this naming rule. We gather the data for training and validation together as KITTI and store the indices for different set in the ImageSet
files.
Training¶
Considering there are many similar frames in the original dataset, we can basically use a subset to train our model primarily. In our preliminary baselines, we load one frame every five frames, and thanks to our hyper parameters settings and data augmentation, we obtain a better result compared with the performance given in the original dataset paper. For more details about the configuration and performance, please refer to README.md in the configs/pointpillars/
. A more complete benchmark based on other settings and methods is coming soon.
Evaluation¶
For evaluation on Waymo, please follow the instruction to build the binary file compute_detection_metrics_main
for metrics computation and put it into mmdet3d/core/evaluation/waymo_utils/
. Basically, you can follow the commands below to install bazel
and build the file.
# download the code and enter the base directory
git clone https://github.com/waymo-research/waymo-open-dataset.git waymo-od
# git clone https://github.com/Abyssaledge/waymo-open-dataset-master waymo-od # if you want to use faster multi-thread version.
cd waymo-od
git checkout remotes/origin/master
# use the Bazel build system
sudo apt-get install --assume-yes pkg-config zip g++ zlib1g-dev unzip python3 python3-pip
BAZEL_VERSION=3.1.0
wget https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo bash bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
sudo apt install build-essential
# configure .bazelrc
./configure.sh
# delete previous bazel outputs and reset internal caches
bazel clean
bazel build waymo_open_dataset/metrics/tools/compute_detection_metrics_main
cp bazel-bin/waymo_open_dataset/metrics/tools/compute_detection_metrics_main ../mmdetection3d/mmdet3d/evaluation/functional/waymo_utils/
Then you can evaluate your models on Waymo. An example to evaluate PointPillars on Waymo with 8 GPUs with Waymo metrics is as follows.
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_secfpn_sbn-all_16xb2-2x_waymo-3d-car.py checkpoints/hv_pointpillars_secfpn_sbn-2x16_2x_waymo-3d-car_latest.pth
pklfile_prefix
should be set in test_evaluator
of configuration if the bin file is needed to be generated, so you can add --cfg-options "test_evaluator.pklfile_prefix=xxxx"
in the end of command if you want do it.
Notice:
Sometimes when using
bazel
to buildcompute_detection_metrics_main
, an error'round' is not a member of 'std'
may appear. We just need to remove thestd::
beforeround
in that file.Considering it takes a little long time to evaluate once, we recommend to evaluate only once at the end of model training.
To use TensorFlow with CUDA 9, it is recommended to compile it from source. Apart from official tutorials, you can refer to this link for possibly suitable precompiled packages and useful information for compiling it from source.
Testing and make a submission¶
An example to test PointPillars on Waymo with 8 GPUs, generate the bin files and make a submission to the leaderboard.
submission_prefix
should be set in test_evaluator
of configuration before you run the test command if you want to generate the bin files and make a submission to the leaderboard..
After generating the bin file, you can simply build the binary file create_submission
and use them to create a submission file by following the instruction. Basically, here are some example commands.
cd ../waymo-od/
bazel build waymo_open_dataset/metrics/tools/create_submission
cp bazel-bin/waymo_open_dataset/metrics/tools/create_submission ../mmdetection3d/mmdet3d/core/evaluation/waymo_utils/
vim waymo_open_dataset/metrics/tools/submission.txtpb # set the metadata information
cp waymo_open_dataset/metrics/tools/submission.txtpb ../mmdetection3d/mmdet3d/evaluation/functional/waymo_utils/
cd ../mmdetection3d
# suppose the result bin is in `results/waymo-car/submission`
mmdet3d/core/evaluation/waymo_utils/create_submission --input_filenames='results/waymo-car/kitti_results_test.bin' --output_filename='results/waymo-car/submission/model' --submission_filename='mmdet3d/evaluation/functional/waymo_utils/submission.txtpb'
tar cvf results/waymo-car/submission/my_model.tar results/waymo-car/submission/my_model/
gzip results/waymo-car/submission/my_model.tar
For evaluation on the validation set with the eval server, you can also use the same way to generate a submission. Make sure you change the fields in submission.txtpb
before running the command above.
SUN RGB-D for 3D Object Detection¶
Dataset preparation¶
For the overall process, please refer to the README page for SUN RGB-D.
Download SUN RGB-D data and toolbox¶
Download SUNRGBD data HERE. Then, move SUNRGBD.zip
, SUNRGBDMeta2DBB_v2.mat
, SUNRGBDMeta3DBB_v2.mat
and SUNRGBDtoolbox.zip
to the OFFICIAL_SUNRGBD
folder, unzip the zip files.
The directory structure before data preparation should be as below:
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
Extract data and annotations for 3D detection from raw data¶
Extract SUN RGB-D annotation data from raw annotation data by running (this requires MATLAB installed on your machine):
matlab -nosplash -nodesktop -r 'extract_split;quit;'
matlab -nosplash -nodesktop -r 'extract_rgbd_data_v2;quit;'
matlab -nosplash -nodesktop -r 'extract_rgbd_data_v1;quit;'
The main steps include:
Extract train and val split.
Extract data for 3D detection from raw data.
Extract and format detection annotation from raw data.
The main component of extract_rgbd_data_v2.m
which extracts point cloud data from depth map is as follows:
data = SUNRGBDMeta(imageId);
data.depthpath(1:16) = '';
data.depthpath = strcat('../OFFICIAL_SUNRGBD', data.depthpath);
data.rgbpath(1:16) = '';
data.rgbpath = strcat('../OFFICIAL_SUNRGBD', data.rgbpath);
% extract point cloud from depth map
[rgb,points3d,depthInpaint,imsize]=read3dPoints(data);
rgb(isnan(points3d(:,1)),:) = [];
points3d(isnan(points3d(:,1)),:) = [];
points3d_rgb = [points3d, rgb];
% MAT files are 3x smaller than TXT files. In Python we can use
% scipy.io.loadmat('xxx.mat')['points3d_rgb'] to load the data.
mat_filename = strcat(num2str(imageId,'%06d'), '.mat');
txt_filename = strcat(num2str(imageId,'%06d'), '.txt');
% save point cloud data
parsave(strcat(depth_folder, mat_filename), points3d_rgb);
The main component of extract_rgbd_data_v1.m
which extracts annotation is as follows:
% Write 2D and 3D box label
data2d = data;
fid = fopen(strcat(det_label_folder, txt_filename), 'w');
for j = 1:length(data.groundtruth3DBB)
centroid = data.groundtruth3DBB(j).centroid; % 3D bbox center
classname = data.groundtruth3DBB(j).classname; % class name
orientation = data.groundtruth3DBB(j).orientation; % 3D bbox orientation
coeffs = abs(data.groundtruth3DBB(j).coeffs); % 3D bbox size
box2d = data2d.groundtruth2DBB(j).gtBb2D; % 2D bbox
fprintf(fid, '%s %d %d %d %d %f %f %f %f %f %f %f %f\n', classname, box2d(1), box2d(2), box2d(3), box2d(4), centroid(1), centroid(2), centroid(3), coeffs(1), coeffs(2), coeffs(3), orientation(1), orientation(2));
end
fclose(fid);
The above two scripts call functions such as read3dPoints
from the toolbox provided by SUN RGB-D.
The directory structure after extraction should be as follows.
sunrgbd
├── README.md
├── matlab
│ ├── extract_rgbd_data_v1.m
│ ├── extract_rgbd_data_v2.m
│ ├── extract_split.m
├── OFFICIAL_SUNRGBD
│ ├── SUNRGBD
│ ├── SUNRGBDMeta2DBB_v2.mat
│ ├── SUNRGBDMeta3DBB_v2.mat
│ ├── SUNRGBDtoolbox
├── sunrgbd_trainval
│ ├── calib
│ ├── depth
│ ├── image
│ ├── label
│ ├── label_v1
│ ├── seg_label
│ ├── train_data_idx.txt
│ ├── val_data_idx.txt
Under each following folder there are overall 5285 train files and 5050 val files:
calib
: Camera calibration information in.txt
depth
: Point cloud saved in.mat
(xyz+rgb)image
: Image data in.jpg
label
: Detection annotation data in.txt
(version 2)label_v1
: Detection annotation data in.txt
(version 1)seg_label
: Segmentation annotation data in.txt
Currently, we use v1 data for training and testing, so the version 2 labels are unused.
Create dataset¶
Please run the command below to create the dataset.
python tools/create_data.py sunrgbd --root-path ./data/sunrgbd \
--out-dir ./data/sunrgbd --extra-tag sunrgbd
or (if in a slurm environment)
bash tools/create_data.sh <job_name> sunrgbd
The above point cloud data are further saved in .bin
format. Meanwhile .pkl
info files are also generated for saving annotation and metadata.
The directory structure after processing should be as follows.
sunrgbd
├── README.md
├── matlab
│ ├── ...
├── OFFICIAL_SUNRGBD
│ ├── ...
├── sunrgbd_trainval
│ ├── ...
├── points
├── sunrgbd_infos_train.pkl
├── sunrgbd_infos_val.pkl
points/xxxxxx.bin
: The point cloud data after downsample.sunrgbd_infos_train.pkl
: The train data infos, the detailed info of each scene is as follows:info[‘lidar_points’]: A dict containing all information related to the the lidar points.
info[‘lidar_points’][‘num_pts_feats’]: The feature dimension of point.
info[‘lidar_points’][‘lidar_path’]: The filename of the lidar point cloud data.
info[‘images’]: A dict containing all information relate to the image data.
info[‘images’][‘CAM0’][‘img_path’]: The filename of the image.
info[‘images’][‘CAM0’][‘depth2img’]: Transformation matrix from depth to image with shape (4, 4).
info[‘images’][‘CAM0’][‘height’]: The height of image.
info[‘images’][‘CAM0’][‘width’]: The width of image.
info[‘instances’]: A list of dict contains all the annotations of this frame. Each dict corresponds to annotations of single instance. For the i-th instance:
info[‘instances’][i][‘bbox_3d’]: List of 7 numbers representing the 3D bounding box in depth coordinate system.
info[‘instances’][i][‘bbox’]: List of 4 numbers representing the 2D bounding box of the instance, in (x1, y1, x2, y2) order.
info[‘instances’][i][‘bbox_label_3d’]: An int indicates the 3D label of instance and the -1 indicates ignore class.
info[‘instances’][i][‘bbox_label’]: An int indicates the 2D label of instance and the -1 indicates ignore class.
sunrgbd_infos_val.pkl
: The val data infos, which shares the same format assunrgbd_infos_train.pkl
.
Train pipeline¶
A typical train pipeline of SUN RGB-D for point cloud only 3D detection is as follows.
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(type='LoadAnnotations3D'),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599, 0.523599],
scale_ratio_range=[0.85, 1.15],
shift_height=True),
dict(type='PointSample', num_points=20000),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
Data augmentation for point clouds:
RandomFlip3D
: randomly flip the input point cloud horizontally or vertically.GlobalRotScaleTrans
: rotate the input point cloud, usually in the range of [-30, 30] (degrees) for SUN RGB-D; then scale the input point cloud, usually in the range of [0.85, 1.15] for SUN RGB-D; finally translate the input point cloud, usually by 0 for SUN RGB-D (which means no translation).PointSample
: downsample the input point cloud.
A typical train pipeline of SUN RGB-D for multi-modality (point cloud and image) 3D detection is as follows.
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations3D'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', scale=(1333, 600), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.0),
dict(type='Pad', size_divisor=32),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.523599, 0.523599],
scale_ratio_range=[0.85, 1.15],
shift_height=True),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d','img', 'gt_bboxes', 'gt_bboxes_labels'])
]
Data augmentation for images:
Resize
: resize the input image,keep_ratio=True
means the ratio of the image is kept unchanged.RandomFlip
: randomly flip the input image.
The image augmentation functions are implemented in MMDetection.
Metrics¶
Same as ScanNet, typically mean Average Precision (mAP) is used for evaluation on SUN RGB-D, e.g. mAP@0.25
and mAP@0.5
. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called. Please refer to indoor_eval for more details.
Since SUN RGB-D consists of image data, detection on image data is also feasible. For instance, in ImVoteNet, we first train an image detector, and we also use mAP for evaluation, e.g. mAP@0.5
. We use the eval_map
function from MMDetection to calculate mAP.
ScanNet for 3D Object Detection¶
Dataset preparation¶
For the overall process, please refer to the README page for ScanNet.
Export ScanNet point cloud data¶
By exporting ScanNet data, we load the raw point cloud data and generate the relevant annotations including semantic labels, instance labels and ground truth bounding boxes.
python batch_load_scannet_data.py
The directory structure before data preparation should be as below
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── scannet
│ │ ├── meta_data
│ │ ├── scans
│ │ │ ├── scenexxxx_xx
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
Under folder scans
there are overall 1201 train and 312 validation folders in which raw point cloud data and relevant annotations are saved. For instance, under folder scene0001_01
the files are as below:
scene0001_01_vh_clean_2.ply
: Mesh file storing coordinates and colors of each vertex. The mesh’s vertices are taken as raw point cloud data.scene0001_01.aggregation.json
: Aggregation file including object ID, segments ID and label.scene0001_01_vh_clean_2.0.010000.segs.json
: Segmentation file including segments ID and vertex.scene0001_01.txt
: Meta file including axis-aligned matrix, etc.scene0001_01_vh_clean_2.labels.ply
: Annotation file containing the category of each vertex.
Export ScanNet data by running python batch_load_scannet_data.py
. The main steps include:
Export original files to point cloud, instance label, semantic label and bounding box file.
Downsample raw point cloud and filter invalid classes.
Save point cloud data and relevant annotation files.
And the core function export
in load_scannet_data.py
is as follows:
def export(mesh_file,
agg_file,
seg_file,
meta_file,
label_map_file,
output_file=None,
test_mode=False):
# label map file: ./data/scannet/meta_data/scannetv2-labels.combined.tsv
# the various label standards in the label map file, e.g. 'nyu40id'
label_map = scannet_utils.read_label_mapping(
label_map_file, label_from='raw_category', label_to='nyu40id')
# load raw point cloud data, 6-dims feature: XYZRGB
mesh_vertices = scannet_utils.read_mesh_vertices_rgb(mesh_file)
# Load scene axis alignment matrix: a 4x4 transformation matrix
# transform raw points in sensor coordinate system to a coordinate system
# which is axis-aligned with the length/width of the room
lines = open(meta_file).readlines()
# test set data doesn't have align_matrix
axis_align_matrix = np.eye(4)
for line in lines:
if 'axisAlignment' in line:
axis_align_matrix = [
float(x)
for x in line.rstrip().strip('axisAlignment = ').split(' ')
]
break
axis_align_matrix = np.array(axis_align_matrix).reshape((4, 4))
# perform global alignment of mesh vertices
pts = np.ones((mesh_vertices.shape[0], 4))
# raw point cloud in homogeneous coordinates, each row: [x, y, z, 1]
pts[:, 0:3] = mesh_vertices[:, 0:3]
# transform raw mesh vertices to aligned mesh vertices
pts = np.dot(pts, axis_align_matrix.transpose()) # Nx4
aligned_mesh_vertices = np.concatenate([pts[:, 0:3], mesh_vertices[:, 3:]],
axis=1)
# Load semantic and instance labels
if not test_mode:
# each object has one semantic label and consists of several segments
object_id_to_segs, label_to_segs = read_aggregation(agg_file)
# many points may belong to the same segment
seg_to_verts, num_verts = read_segmentation(seg_file)
label_ids = np.zeros(shape=(num_verts), dtype=np.uint32)
object_id_to_label_id = {}
for label, segs in label_to_segs.items():
label_id = label_map[label]
for seg in segs:
verts = seg_to_verts[seg]
# each point has one semantic label
label_ids[verts] = label_id
instance_ids = np.zeros(
shape=(num_verts), dtype=np.uint32) # 0: unannotated
for object_id, segs in object_id_to_segs.items():
for seg in segs:
verts = seg_to_verts[seg]
# object_id is 1-indexed, i.e. 1,2,3,.,,,.NUM_INSTANCES
# each point belongs to one object
instance_ids[verts] = object_id
if object_id not in object_id_to_label_id:
object_id_to_label_id[object_id] = label_ids[verts][0]
# bbox format is [x, y, z, x_size, y_size, z_size, label_id]
# [x, y, z] is gravity center of bbox, [x_size, y_size, z_size] is axis-aligned
# [label_id] is semantic label id in 'nyu40id' standard
# Note: since 3D bbox is axis-aligned, the yaw is 0.
unaligned_bboxes = extract_bbox(mesh_vertices, object_id_to_segs,
object_id_to_label_id, instance_ids)
aligned_bboxes = extract_bbox(aligned_mesh_vertices, object_id_to_segs,
object_id_to_label_id, instance_ids)
...
return mesh_vertices, label_ids, instance_ids, unaligned_bboxes, \
aligned_bboxes, object_id_to_label_id, axis_align_matrix
After exporting each scan, the raw point cloud could be downsampled, e.g. to 50000, if the number of points is too large (the raw point cloud won’t be downsampled if it’s also used in 3D semantic segmentation task). In addition, invalid semantic labels outside of nyu40id
standard or optional DONOT CARE
classes should be filtered. Finally, the point cloud data, semantic labels, instance labels and ground truth bounding boxes should be saved in .npy
files.
Export ScanNet RGB data (optional)¶
By exporting ScanNet RGB data, for each scene we load a set of RGB images with corresponding 4x4 pose matrices, and a single 4x4 camera intrinsic matrix. Note, that this step is optional and can be skipped if multi-view detection is not planned to use.
python extract_posed_images.py
Each of 1201 train, 312 validation and 100 test scenes contains a single .sens
file. For instance, for scene 0001_01
we have data/scannet/scans/scene0001_01/0001_01.sens
. For this scene all images and poses are extracted to data/scannet/posed_images/scene0001_01
. Specifically, there will be 300 image files xxxxx.jpg, 300 camera pose files xxxxx.txt and a single intrinsic.txt
file. Typically, single scene contains several thousand images. By default, we extract only 300 of them with resulting space occupation of <100 Gb. To extract more images, use --max-images-per-scene
parameter.
Create dataset¶
python tools/create_data.py scannet --root-path ./data/scannet \
--out-dir ./data/scannet --extra-tag scannet
The above exported point cloud file, semantic label file and instance label file are further saved in .bin
format. Meanwhile .pkl
info files are also generated for train or validation. The core function process_single_scene
of getting data infos is as follows.
def process_single_scene(sample_idx):
# save point cloud, instance label and semantic label in .bin file respectively, get info['pts_path'], info['pts_instance_mask_path'] and info['pts_semantic_mask_path']
...
# get annotations
if has_label:
annotations = {}
# box is of shape [k, 6 + class]
aligned_box_label = self.get_aligned_box_label(sample_idx)
unaligned_box_label = self.get_unaligned_box_label(sample_idx)
annotations['gt_num'] = aligned_box_label.shape[0]
if annotations['gt_num'] != 0:
aligned_box = aligned_box_label[:, :-1] # k, 6
unaligned_box = unaligned_box_label[:, :-1]
classes = aligned_box_label[:, -1] # k
annotations['name'] = np.array([
self.label2cat[self.cat_ids2class[classes[i]]]
for i in range(annotations['gt_num'])
])
# default names are given to aligned bbox for compatibility
# we also save unaligned bbox info with marked names
annotations['location'] = aligned_box[:, :3]
annotations['dimensions'] = aligned_box[:, 3:6]
annotations['gt_boxes_upright_depth'] = aligned_box
annotations['unaligned_location'] = unaligned_box[:, :3]
annotations['unaligned_dimensions'] = unaligned_box[:, 3:6]
annotations[
'unaligned_gt_boxes_upright_depth'] = unaligned_box
annotations['index'] = np.arange(
annotations['gt_num'], dtype=np.int32)
annotations['class'] = np.array([
self.cat_ids2class[classes[i]]
for i in range(annotations['gt_num'])
])
axis_align_matrix = self.get_axis_align_matrix(sample_idx)
annotations['axis_align_matrix'] = axis_align_matrix # 4x4
info['annos'] = annotations
return info
The directory structure after process should be as below:
scannet
├── meta_data
├── batch_load_scannet_data.py
├── load_scannet_data.py
├── scannet_utils.py
├── README.md
├── scans
├── scans_test
├── scannet_instance_data
├── points
│ ├── xxxxx.bin
├── instance_mask
│ ├── xxxxx.bin
├── semantic_mask
│ ├── xxxxx.bin
├── seg_info
│ ├── train_label_weight.npy
│ ├── train_resampled_scene_idxs.npy
│ ├── val_label_weight.npy
│ ├── val_resampled_scene_idxs.npy
├── posed_images
│ ├── scenexxxx_xx
│ │ ├── xxxxxx.txt
│ │ ├── xxxxxx.jpg
│ │ ├── intrinsic.txt
├── scannet_infos_train.pkl
├── scannet_infos_val.pkl
├── scannet_infos_test.pkl
points/xxxxx.bin
: Theaxis-unaligned
point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipelineGlobalAlignment
of 3D detection task.instance_mask/xxxxx.bin
: The instance label for each point, value range: [0, NUM_INSTANCES], 0: unannotated.semantic_mask/xxxxx.bin
: The semantic label for each point, value range: [1, 40], i.e.nyu40id
standard. Note: thenyu40id
ID will be mapped to train ID in train pipelinePointSegClassMapping
.posed_images/scenexxxx_xx
: The set of.jpg
images with.txt
4x4 poses and the single.txt
file with camera intrinsic matrix.scannet_infos_train.pkl
: The train data infos, the detailed info of each scan is as follows:info[‘lidar_points’]: A dict containing all information related to the lidar points.
info[‘lidar_points’][‘lidar_path’]: The filename of the lidar point cloud data.
info[‘lidar_points’][‘num_pts_feats’]: The feature dimension of point.
info[‘lidar_points’][‘axis_align_matrix’]: The transformation matrix to align the axis.
info[‘pts_semantic_mask_path’]: The filename of the semantic mask annotation.
info[‘pts_instance_mask_path’]: The filename of the instance mask annotation.
info[‘instances’]: A list of dict contains all annotations, each dict contains all annotation information of single instance. For the i-th instance:
info[‘instances’][i][‘bbox_3d’]: List of 6 numbers representing the axis-aligned 3D bounding box of the instance in depth coordinate system, in (x, y, z, l, w, h) order.
info[‘instances’][i][‘bbox_label_3d’]: The label of each 3d bounding boxes.
scannet_infos_val.pkl
: The val data infos, which shares the same format asscannet_infos_train.pkl
.scannet_infos_test.pkl
: The test data infos, which almost shares the same format asscannet_infos_train.pkl
except for the lack of annotation.
Training pipeline¶
A typical training pipeline of ScanNet for 3D detection is as follows.
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=True,
load_dim=6,
use_dim=[0, 1, 2]),
dict(
type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True,
with_mask_3d=True,
with_seg_3d=True),
dict(type='GlobalAlignment', rotation_axis=2),
dict(type='PointSegClassMapping'),
dict(type='PointSample', num_points=40000),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
flip_ratio_bev_vertical=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.087266, 0.087266],
scale_ratio_range=[1.0, 1.0],
shift_height=True),
dict(
type='Pack3DDetInputs',
keys=[
'points', 'gt_bboxes_3d', 'gt_labels_3d', 'pts_semantic_mask',
'pts_instance_mask'
])
]
GlobalAlignment
: The previous point cloud would be axis-aligned using the axis-aligned matrix.PointSegClassMapping
: Only the valid category IDs will be mapped to class label IDs like [0, 18) during training.Data augmentation:
PointSample
: downsample the input point cloud.RandomFlip3D
: randomly flip the input point cloud horizontally or vertically.GlobalRotScaleTrans
: rotate the input point cloud, usually in the range of [-5, 5] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet (which means no scaling); finally translate the input point cloud, usually by 0 for ScanNet (which means no translation).
Metrics¶
Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. mAP@0.25
and mAP@0.5
. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called. Please refer to indoor_eval for more details.
As introduced in section Export ScanNet data
, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D Non-Maximum Suppression (NMS), which is regardless of rotation, is adopted during post-processing .
ScanNet for 3D Semantic Segmentation¶
Dataset preparation¶
The overall process is similar to ScanNet 3D detection task. Please refer to this section. Only a few differences and additional information about the 3D semantic segmentation data will be listed below.
Export ScanNet data¶
Since ScanNet provides online benchmark for 3D semantic segmentation evaluation on the test set, we need to also download the test scans and put it under scannet
folder.
The directory structure before data preparation should be as below:
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── scannet
│ │ ├── meta_data
│ │ ├── scans
│ │ │ ├── scenexxxx_xx
│ │ ├── scans_test
│ │ │ ├── scenexxxx_xx
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
Under folder scans_test
there are 100 test folders in which only raw point cloud data and its meta file are saved. For instance, under folder scene0707_00
the files are as below:
scene0707_00_vh_clean_2.ply
: Mesh file storing coordinates and colors of each vertex. The mesh’s vertices are taken as raw point cloud data.scene0707_00.txt
: Meta file including sensor parameters, etc. Note: different from data underscans
, axis-aligned matrix is not provided for test scans.
Export ScanNet data by running python batch_load_scannet_data.py
. Note: only point cloud data will be saved for test set scans because no annotations are provided.
Create dataset¶
Similar to the 3D detection task, we create dataset by running python tools/create_data.py scannet --root-path ./data/scannet --out-dir ./data/scannet --extra-tag scannet
.
The directory structure after processing should be as below:
scannet
├── scannet_utils.py
├── batch_load_scannet_data.py
├── load_scannet_data.py
├── scannet_utils.py
├── README.md
├── scans
├── scans_test
├── scannet_instance_data
├── points
│ ├── xxxxx.bin
├── instance_mask
│ ├── xxxxx.bin
├── semantic_mask
│ ├── xxxxx.bin
├── seg_info
│ ├── train_label_weight.npy
│ ├── train_resampled_scene_idxs.npy
│ ├── val_label_weight.npy
│ ├── val_resampled_scene_idxs.npy
├── scannet_infos_train.pkl
├── scannet_infos_val.pkl
├── scannet_infos_test.pkl
seg_info
: The generated infos to support semantic segmentation model training.train_label_weight.npy
: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it’s a common practice to use label re-weighting to get a better performance.train_resampled_scene_idxs.npy
: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points to balance training data.
Training pipeline¶
A typical training pipeline of ScanNet for 3D semantic segmentation is as below:
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=False,
use_color=True,
load_dim=6,
use_dim=[0, 1, 2, 3, 4, 5]),
dict(
type='LoadAnnotations3D',
with_bbox_3d=False,
with_label_3d=False,
with_mask_3d=False,
with_seg_3d=True),
dict(
type='PointSegClassMapping'),
dict(
type='IndoorPatchPointSample',
num_points=num_points,
block_size=1.5,
ignore_index=len(class_names),
use_normalized_coord=False,
enlarge_size=0.2,
min_unique_num=None),
dict(type='NormalizePointsColor', color_mean=None),
dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
]
PointSegClassMapping
: Only the valid category ids will be mapped to class label ids like [0, 20) during training. Other class ids will be converted toignore_index
which equals to20
.IndoorPatchPointSample
: Crop a patch containing a fixed number of points from input point cloud.block_size
indicates the size of the cropped block, typically1.5
for ScanNet.NormalizePointsColor
: Normalize the RGB color values of input point cloud by dividing255
.
Metrics¶
Typically mean Intersection over Union (mIoU) is used for evaluation on ScanNet. In detail, we first compute IoU for multiple classes and then average them to get mIoU, please refer to seg_eval.
Testing and Making a Submission¶
By default, our codebase evaluates semantic segmentation results on the validation set.
If you would like to test the model performance on the online benchmark, add --format-only
flag in the evaluation script and change ann_file=data_root + 'scannet_infos_val.pkl'
to ann_file=data_root + 'scannet_infos_test.pkl'
in the ScanNet dataset’s config. Remember to specify the txt_prefix
as the directory to save the testing results.
Taking PointNet++ (SSG) on ScanNet for example, the following command can be used to do inference on test set:
./tools/dist_test.sh configs/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class.py \
work_dirs/pointnet2_ssg/latest.pth --format-only \
--eval-options txt_prefix=work_dirs/pointnet2_ssg/test_submission
After generating the results, you can basically compress the folder and upload to the ScanNet evaluation server.
S3DIS for 3D Semantic Segmentation¶
Dataset preparation¶
For the overall process, please refer to the README page for S3DIS.
Export S3DIS data¶
By exporting S3DIS data, we load the raw point cloud data and generate the relevant annotations including semantic labels and instance labels.
The directory structure before exporting should be as below:
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── s3dis
│ │ ├── meta_data
│ │ ├── Stanford3dDataset_v1.2_Aligned_Version
│ │ │ ├── Area_1
│ │ │ │ ├── conferenceRoom_1
│ │ │ │ ├── office_1
│ │ │ │ ├── ...
│ │ │ ├── Area_2
│ │ │ ├── Area_3
│ │ │ ├── Area_4
│ │ │ ├── Area_5
│ │ │ ├── Area_6
│ │ ├── indoor3d_util.py
│ │ ├── collect_indoor3d_data.py
│ │ ├── README.md
Under folder Stanford3dDataset_v1.2_Aligned_Version
, the rooms are spilted into 6 areas. We use 5 areas for training and 1 for evaluation (typically Area_5
). Under the directory of each area, there are folders in which raw point cloud data and relevant annotations are saved. For instance, under folder Area_1/office_1
the files are as below:
office_1.txt
: A txt file storing coordinates and colors of each point in the raw point cloud data.Annotations/
: This folder contains txt files for different object instances. Each txt file represents one instance, e.g.chair_1.txt
: A txt file storing raw point cloud data of one chair in this room.
If we concat all the txt files under
Annotations/
, we will get the same point cloud as denoted byoffice_1.txt
.
Export S3DIS data by running python collect_indoor3d_data.py
. The main steps include:
Export original txt files to point cloud, instance label and semantic label.
Save point cloud data and relevant annotation files.
And the core function export
in indoor3d_util.py
is as follows:
def export(anno_path, out_filename):
"""Convert original dataset files to points, instance mask and semantic
mask files. We aggregated all the points from each instance in the room.
Args:
anno_path (str): path to annotations. e.g. Area_1/office_2/Annotations/
out_filename (str): path to save collected points and labels.
file_format (str): txt or numpy, determines what file format to save.
Note:
the points are shifted before save, the most negative point is now
at origin.
"""
points_list = []
ins_idx = 1 # instance ids should be indexed from 1, so 0 is unannotated
# an example of `anno_path`: Area_1/office_1/Annotations
# which contains all object instances in this room as txt files
for f in glob.glob(osp.join(anno_path, '*.txt')):
# get class name of this instance
one_class = osp.basename(f).split('_')[0]
if one_class not in class_names: # some rooms have 'staris' class
one_class = 'clutter'
points = np.loadtxt(f)
labels = np.ones((points.shape[0], 1)) * class2label[one_class]
ins_labels = np.ones((points.shape[0], 1)) * ins_idx
ins_idx += 1
points_list.append(np.concatenate([points, labels, ins_labels], 1))
data_label = np.concatenate(points_list, 0) # [N, 8], (pts, rgb, sem, ins)
# align point cloud to the origin
xyz_min = np.amin(data_label, axis=0)[0:3]
data_label[:, 0:3] -= xyz_min
np.save(f'{out_filename}_point.npy', data_label[:, :6].astype(np.float32))
np.save(f'{out_filename}_sem_label.npy', data_label[:, 6].astype(np.int64))
np.save(f'{out_filename}_ins_label.npy', data_label[:, 7].astype(np.int64))
where we load and concatenate all the point cloud instances under Annotations/
to form raw point cloud and generate semantic/instance labels. After exporting each room, the point cloud data, semantic labels and instance labels should be saved in .npy
files.
Create dataset¶
python tools/create_data.py s3dis --root-path ./data/s3dis \
--out-dir ./data/s3dis --extra-tag s3dis
The above exported point cloud files, semantic label files and instance label files are further saved in .bin
format. Meanwhile .pkl
info files are also generated for each area.
The directory structure after process should be as below:
s3dis
├── meta_data
├── indoor3d_util.py
├── collect_indoor3d_data.py
├── README.md
├── Stanford3dDataset_v1.2_Aligned_Version
├── s3dis_data
├── points
│ ├── xxxxx.bin
├── instance_mask
│ ├── xxxxx.bin
├── semantic_mask
│ ├── xxxxx.bin
├── seg_info
│ ├── Area_1_label_weight.npy
│ ├── Area_1_resampled_scene_idxs.npy
│ ├── Area_2_label_weight.npy
│ ├── Area_2_resampled_scene_idxs.npy
│ ├── Area_3_label_weight.npy
│ ├── Area_3_resampled_scene_idxs.npy
│ ├── Area_4_label_weight.npy
│ ├── Area_4_resampled_scene_idxs.npy
│ ├── Area_5_label_weight.npy
│ ├── Area_5_resampled_scene_idxs.npy
│ ├── Area_6_label_weight.npy
│ ├── Area_6_resampled_scene_idxs.npy
├── s3dis_infos_Area_1.pkl
├── s3dis_infos_Area_2.pkl
├── s3dis_infos_Area_3.pkl
├── s3dis_infos_Area_4.pkl
├── s3dis_infos_Area_5.pkl
├── s3dis_infos_Area_6.pkl
points/xxxxx.bin
: The exported point cloud data.instance_mask/xxxxx.bin
: The instance label for each point, value range: [0, ${NUM_INSTANCES}], 0: unannotated.semantic_mask/xxxxx.bin
: The semantic label for each point, value range: [0, 12].s3dis_infos_Area_1.pkl
: Area 1 data infos, the detailed info of each room is as follows:info[‘point_cloud’]: {‘num_features’: 6, ‘lidar_idx’: sample_idx}.
info[‘pts_path’]: The path of
points/xxxxx.bin
.info[‘pts_instance_mask_path’]: The path of
instance_mask/xxxxx.bin
.info[‘pts_semantic_mask_path’]: The path of
semantic_mask/xxxxx.bin
.
seg_info
: The generated infos to support semantic segmentation model training.Area_1_label_weight.npy
: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it’s a common practice to use label re-weighting to get a better performance.Area_1_resampled_scene_idxs.npy
: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points to balance training data.
Training pipeline¶
A typical training pipeline of S3DIS for 3D semantic segmentation is as below.
class_names = ('ceiling', 'floor', 'wall', 'beam', 'column', 'window', 'door',
'table', 'chair', 'sofa', 'bookcase', 'board', 'clutter')
num_points = 4096
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='DEPTH',
shift_height=False,
use_color=True,
load_dim=6,
use_dim=[0, 1, 2, 3, 4, 5]),
dict(
type='LoadAnnotations3D',
with_bbox_3d=False,
with_label_3d=False,
with_mask_3d=False,
with_seg_3d=True),
dict(
type='PointSegClassMapping'),
dict(
type='IndoorPatchPointSample',
num_points=num_points,
block_size=1.0,
ignore_index=None,
use_normalized_coord=True,
enlarge_size=None,
min_unique_num=num_points // 4,
eps=0.0),
dict(type='NormalizePointsColor', color_mean=None),
dict(
type='GlobalRotScaleTrans',
rot_range=[-3.141592653589793, 3.141592653589793], # [-pi, pi]
scale_ratio_range=[0.8, 1.2],
translation_std=[0, 0, 0]),
dict(
type='RandomJitterPoints',
jitter_std=[0.01, 0.01, 0.01],
clip_range=[-0.05, 0.05]),
dict(type='RandomDropPointsColor', drop_ratio=0.2),
dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
]
PointSegClassMapping
: Only the valid category ids will be mapped to class label ids like [0, 13) during training. Other class ids will be converted toignore_index
which equals to13
.IndoorPatchPointSample
: Crop a patch containing a fixed number of points from input point cloud.block_size
indicates the size of the cropped block, typically1.0
for S3DIS.NormalizePointsColor
: Normalize the RGB color values of input point cloud by dividing255
.Data augmentation:
GlobalRotScaleTrans
: randomly rotate and scale input point cloud.RandomJitterPoints
: randomly jitter point cloud by adding different noise vector to each point.RandomDropPointsColor
: set the colors of point cloud to all zeros by a probabilitydrop_ratio
.
Metrics¶
Typically mean intersection over union (mIoU) is used for evaluation on S3DIS. In detail, we first compute IoU for multiple classes and then average them to get mIoU, please refer to seg_eval.py.
As introduced in section Export S3DIS data
, S3DIS trains on 5 areas and evaluates on the remaining 1 area. But there are also other area split schemes in different papers.
To enable flexible combination of train-val splits, we use sub-dataset to represent one area, and concatenate them to form a larger training set. An example of training on area 1, 2, 3, 4, 6 and evaluating on area 5 is shown as below:
dataset_type = 'S3DISSegDataset'
data_root = './data/s3dis/'
class_names = ('ceiling', 'floor', 'wall', 'beam', 'column', 'window', 'door',
'table', 'chair', 'sofa', 'bookcase', 'board', 'clutter')
train_area = [1, 2, 3, 4, 6]
test_area = 5
train_dataloader = dict(
batch_size=8,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_files=[f's3dis_infos_Area_{i}.pkl' for i in train_area],
metainfo=metainfo,
data_prefix=data_prefix,
pipeline=train_pipeline,
modality=input_modality,
ignore_index=len(class_names),
scene_idxs=[
f'seg_info/Area_{i}_resampled_scene_idxs.npy' for i in train_area
],
test_mode=False))
test_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_files=f's3dis_infos_Area_{test_area}.pkl',
metainfo=metainfo,
data_prefix=data_prefix,
pipeline=test_pipeline,
modality=input_modality,
ignore_index=len(class_names),
scene_idxs=f'seg_info/Area_{test_area}_resampled_scene_idxs.npy',
test_mode=True))
val_dataloader = test_dataloader
where we specify the areas used for training/validation by setting ann_files
and scene_idxs
with lists that include corresponding paths. The train-val split can be simply modified via changing the train_area
and test_area
variables.
Supported Tasks¶
LiDAR-Based 3D Detection¶
LiDAR-based 3D detection is one of the most basic tasks supported in MMDetection3D. It expects the given model to take any number of points with features collected by LiDAR as input, and predict the 3D bounding boxes and category labels for each object of interest. Next, taking PointPillars on the KITTI dataset as an example, we will show how to prepare data, train and test a model on a standard 3D detection benchmark, and how to visualize and validate the results.
Data Preparation¶
To begin with, we need to download the raw data and reorganize the data in a standard way presented in the doc for data preparation.
Note that for KITTI, we need extra .txt
files for data splits.
Due to different ways of organizing the raw data in different datasets, we typically need to collect the useful data information with a .pkl
file.
So after getting all the raw data ready, we need to run the scripts provided in the create_data.py
for different datasets to generate data infos.
For example, for KITTI we need to run:
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti
Afterwards, the related folder structure should be as follows:
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── kitti
│ │ ├── ImageSets
│ │ ├── testing
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced
│ │ ├── training
│ │ │ ├── calib
│ │ │ ├── image_2
│ │ │ ├── label_2
│ │ │ ├── velodyne
│ │ │ ├── velodyne_reduced
│ │ ├── kitti_gt_database
│ │ ├── kitti_infos_train.pkl
│ │ ├── kitti_infos_trainval.pkl
│ │ ├── kitti_infos_val.pkl
│ │ ├── kitti_infos_test.pkl
│ │ ├── kitti_dbinfos_train.pkl
Training¶
Then let us train a model with provided configs for PointPillars. You can basically follow the examples provided in this tutorial when training with different GPU settings. Suppose we use 8 GPUs on a single machine with distributed training:
./tools/dist_train.sh configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py 8
Note that 8xb6
in the config name refers to the training is completed with 8 GPUs and 6 samples on each GPU.
If your customized setting is different from this, sometimes you need to adjust the learning rate accordingly.
A basic rule can be referred to here. We have supported --auto-scale-lr
to
enable automatically scaling LR.
Quantitative Evaluation¶
During training, the model checkpoints will be evaluated regularly according to the setting of train_cfg = dict(val_interval=xxx)
in the config.
We support official evaluation protocols for different datasets.
For KITTI, the model will be evaluated with mean average precision (mAP) with Intersection over Union (IoU) thresholds 0.5/0.7 for 3 categories respectively.
The evaluation results will be printed in the command like:
Car AP@0.70, 0.70, 0.70:
bbox AP:98.1839, 89.7606, 88.7837
bev AP:89.6905, 87.4570, 85.4865
3d AP:87.4561, 76.7569, 74.1302
aos AP:97.70, 88.73, 87.34
Car AP@0.70, 0.50, 0.50:
bbox AP:98.1839, 89.7606, 88.7837
bev AP:98.4400, 90.1218, 89.6270
3d AP:98.3329, 90.0209, 89.4035
aos AP:97.70, 88.73, 87.34
In addition, you can also evaluate a specific model checkpoint after training is finished. Simply run scripts like the following:
./tools/dist_test.sh configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py work_dirs/pointpillars/latest.pth 8
Testing and Making a Submission¶
If you would like to only conduct inference or test the model performance on the online benchmark,
you need to specify the submission_prefix
for corresponding evaluator,
e.g., add test_evaluator = dict(type='KittiMetric', ann_file=data_root + 'kitti_infos_test.pkl', format_only=True, pklfile_prefix='results/kitti-3class/kitti_results', submission_prefix='results/kitti-3class/kitti_results')
in the configuration then you can get the results file.
Please guarantee the data_prefix
and ann_file
in info for testing in the config corresponds to the test set instead of validation set.
After generating the results, you can basically compress the folder and upload to the KITTI evaluation server.
Qualitative Validation¶
MMDetection3D also provides versatile tools for visualization such that we can have an intuitive feeling of the detection results predicted by our trained models.
You can either set the --show
option to visualize the detection results online during evaluation,
or using tools/misc/visualize_results.py
for offline visualization.
Besides, we also provide scripts tools/misc/browse_dataset.py
to visualize the dataset without inference.
Please refer more details in the doc for visualization.
Vision-Based 3D Detection¶
Vision-based 3D detection refers to the 3D detection solutions based on vision-only input, such as monocular, binocular, and multi-view image based 3D detection. Currently, we only support monocular and multi-view 3D detection methods. Other approaches should be also compatible with our framework and will be supported in the future.
It expects the given model to take any number of images as input, and predict the 3D bounding boxes and category labels for each object of interest. Taking FCOS3D on the nuScenes dataset as an example, we will show how to prepare data, train and test a model on a standard 3D detection benchmark, and how to visualize and validate the results.
Data Preparation¶
To begin with, we need to download the raw data and reorganize the data in a standard way presented in the doc for data preparation.
Due to different ways of organizing the raw data in different datasets, we typically need to collect the useful data information with a .pkl or .json file.
So after getting all the raw data ready, we need to run the scripts provided in the create_data.py
for different datasets to generate data infos.
For example, for nuScenes we need to run:
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
Afterwards, the related folder structure should be as follows:
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ │ ├── nuscenes_database
│ │ ├── nuscenes_infos_train.pkl
│ │ ├── nuscenes_infos_trainval.pkl
│ │ ├── nuscenes_infos_val.pkl
│ │ ├── nuscenes_infos_test.pkl
│ │ ├── nuscenes_dbinfos_train.pkl
Training¶
Then let us train a model with provided configs for FCOS3D. The basic script is the same as other models. You can basically follow the examples provided in this tutorial when training with different GPU settings. Suppose we use 8 GPUs on a single machine with distributed training:
./tools/dist_train.sh configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d.py 8
Note that 8xb2
in the config name refers to the training is completed with 8 GPUs and 2 data samples on each GPU.
If your customized setting is different from this, you should add --auto-scale-lr
to enable automatically scaling learning rate. A basic rule can be referred to here.
We can also achieve better performance with finetuned FCOS3D by running:
./tools/dist_train.sh configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d_finetune.py 8
After training a baseline model with the previous script, please remember to modify the path here correspondingly.
Quantitative Evaluation¶
During training, the model checkpoints will be evaluated regularly according to the setting of train_cfg = dict(val_interval=xxx)
in the config.
We support official evaluation protocols for different datasets. Due to the output format is the same as 3D detection based on other modalities, the evaluation methods are also the same.
For nuScenes, the model will be evaluated with distance-based mean AP (mAP) and NuScenes Detection Score (NDS) for 10 categories respectively. The evaluation results will be printed in the command like:
mAP: 0.3197
mATE: 0.7595
mASE: 0.2700
mAOE: 0.4918
mAVE: 1.3307
mAAE: 0.1724
NDS: 0.3905
Eval time: 170.8s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.503 0.577 0.152 0.111 2.096 0.136
truck 0.223 0.857 0.224 0.220 1.389 0.179
bus 0.294 0.855 0.204 0.190 2.689 0.283
trailer 0.081 1.094 0.243 0.553 0.742 0.167
construction_vehicle 0.058 1.017 0.450 1.019 0.137 0.341
pedestrian 0.392 0.687 0.284 0.694 0.876 0.158
motorcycle 0.317 0.737 0.265 0.580 2.033 0.104
bicycle 0.308 0.704 0.299 0.892 0.683 0.010
traffic_cone 0.555 0.486 0.309 nan nan nan
barrier 0.466 0.581 0.269 0.169 nan nan
In addition, you can also evaluate a specific model checkpoint after training is finished. Simply run scripts like the following:
./tools/dist_test.sh configs/fcos3d/fcos3d_r101-caffe-dcn_fpn_head-gn_8xb2-1x_nus-mono3d.py work_dirs/fcos3d/latest.pth 8
Testing and Making a Submission¶
If you would like to only conduct inference or test the model performance on the online benchmark,
you just need to specify the jsonfile_prefix
for corresponding evaluator,
e.g., add test_evaluator = dict(type='NuscenesMetric', jsonfile_prefix=work_dirs/fcos3d/test_submission)
in the configuration then you can get the results file.
Please guarantee the data_prefix
and ann_file
in info for testing in the config corresponds to the test set instead of validation set.
After generating the results, you can basically compress the folder and upload to the evalAI evaluation server for nuScenes 3D detection challenge.
Qualitative Validation¶
MMDetection3D also provides versatile tools for visualization such that we can have an intuitive feeling of the detection results predicted by our trained models.
You can either set the --eval-options 'show=True' 'out_dir=${SHOW_DIR}'
option to visualize the detection results online during evaluation,
or using tools/misc/visualize_results.py
for offline visualization.
Besides, we also provide scripts tools/misc/browse_dataset.py
to visualize the dataset without inference.
Please refer more details in the doc for visualization.
Note that currently we only support the visualization on images for vision-only methods. The visualization in the perspective view and bird-eye-view (BEV) will be integrated in the future.
LiDAR-Based 3D Semantic Segmentation¶
LiDAR-based 3D semantic segmentation is one of the most basic tasks supported in MMDetection3D. It expects the given model to take any number of points with features collected by LiDAR as input, and predict the semantic labels for each input point. Next, taking PointNet++ (SSG) on the ScanNet dataset as an example, we will show how to prepare data, train and test a model on a standard 3D semantic segmentation benchmark, and how to visualize and validate the results.
Data Preparation¶
To begin with, we need to download the raw data from ScanNet’s official website.
Due to different ways of organizing the raw data in different datasets, we typically need to collect the useful data information with a .pkl or .json file.
So after getting all the raw data ready, we can follow the instructions presented in ScanNet README doc to generate data infos.
Afterwards, the related folder structure should be as follows:
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── scannet
│ │ ├── scannet_utils.py
│ │ ├── batch_load_scannet_data.py
│ │ ├── load_scannet_data.py
│ │ ├── scannet_utils.py
│ │ ├── README.md
│ │ ├── scans
│ │ ├── scans_test
│ │ ├── scannet_instance_data
│ │ ├── points
│ │ ├── instance_mask
│ │ ├── semantic_mask
│ │ ├── seg_info
│ │ │ ├── train_label_weight.npy
│ │ │ ├── train_resampled_scene_idxs.npy
│ │ │ ├── val_label_weight.npy
│ │ │ ├── val_resampled_scene_idxs.npy
│ │ ├── scannet_infos_train.pkl
│ │ ├── scannet_infos_val.pkl
│ │ ├── scannet_infos_test.pkl
Training¶
Then let us train a model with provided configs for PointNet++ (SSG). You can basically follow this tutorial for sample scripts when training with different GPU settings. Suppose we use 2 GPUs on a single machine with distributed training:
./tools/dist_train.sh configs/pointnet2/pointnet2_ssg_2xb16-cosine-200e_scannet-seg.py 2
Note that 2xb16
in the config name refers to the training is completed with 2 GPUs and 16 samples on each GPU.
If your customized setting is different from this, sometimes you need to adjust the learning rate accordingly.
A basic rule can be referred to here.
Quantitative Evaluation¶
During training, the model checkpoints will be evaluated regularly according to the setting of train_cfg = dict(val_interval=xxx)
in the config.
We support official evaluation protocols for different datasets.
For ScanNet, the model will be evaluated with mean Intersection over Union (mIoU) over all 20 categories.
The evaluation results will be printed in the command like:
+---------+--------+--------+---------+--------+--------+--------+--------+--------+--------+-----------+---------+---------+--------+---------+--------------+----------------+--------+--------+---------+----------------+--------+--------+---------+
| classes | wall | floor | cabinet | bed | chair | sofa | table | door | window | bookshelf | picture | counter | desk | curtain | refrigerator | showercurtrain | toilet | sink | bathtub | otherfurniture | miou | acc | acc_cls |
+---------+--------+--------+---------+--------+--------+--------+--------+--------+--------+-----------+---------+---------+--------+---------+--------------+----------------+--------+--------+---------+----------------+--------+--------+---------+
| results | 0.7257 | 0.9373 | 0.4625 | 0.6613 | 0.7707 | 0.5562 | 0.5864 | 0.4010 | 0.4558 | 0.7011 | 0.2500 | 0.4645 | 0.4540 | 0.5399 | 0.2802 | 0.3488 | 0.7359 | 0.4971 | 0.6922 | 0.3681 | 0.5444 | 0.8118 | 0.6695 |
+---------+--------+--------+---------+--------+--------+--------+--------+--------+--------+-----------+---------+---------+--------+---------+--------------+----------------+--------+--------+---------+----------------+--------+--------+---------+
In addition, you can also evaluate a specific model checkpoint after training is finished. Simply run scripts like the following:
./tools/dist_test.sh configs/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class.py work_dirs/pointnet2_ssg/latest.pth 8
Testing and Making a Submission¶
If you would like to only conduct inference or test the model performance on the online benchmark,
you should change ann_file='scannet_infos_val.pkl'
to ann_file='scannet_infos_test.pkl'
in the
ScanNet dataset’s config. Remember to
specify the submission_prefix
in the test_evaluator
,
e.g., adding test_evaluator = dict(type='SegMetric', submission_prefix=work_dirs/pointnet2_ssg/test_submission
) or just add --cfg-options test_evaluator.submission_prefix=work_dirs/pointnet2_ssg/test_submission
in the end of command.
After generating the results, you can basically compress the folder and upload to the ScanNet evaluation server.
Qualitative Validation¶
MMDetection3D also provides versatile tools for visualization such that you can use tools/misc/visualize_results.py
with results pkl file for offline visualization of add --show
in the end of test command to do the online visualization.
Besides, we also provide scripts tools/misc/browse_dataset.py
to visualize the dataset without inference.
Please refer more details in the doc for visualization.
Customization¶
Customize Datasets¶
In this note, you will know how to train and test predefined models with customized datasets.
The basic steps are as below:
Prepare data
Prepare a config
Train, test and inference models on the customized dataset
Data Preparation¶
The ideal situation is that we can reorganize the customized raw data and convert the annotation format into KITTI style. However, considering some calibration files and 3D annotations in KITTI format are difficult to obtain for customized datasets, we introduce the basic data format in the doc.
Basic Data Format¶
Point cloud Format¶
Currently, we only support .bin
format point cloud for training and inference. Before training on your own datasets, you need to convert your point cloud files with other formats to .bin
files. The common point cloud data formats include .pcd
and .las
, we list some open-source tools for reference.
Convert
.pcd
to.bin
: https://github.com/DanielPollithy/pypcd
You can install
pypcd
with the following command:pip install git+https://github.com/DanielPollithy/pypcd.git
You can use the following script to read the
.pcd
file and convert it to.bin
format for saving:import numpy as np from pypcd import pypcd pcd_data = pypcd.PointCloud.from_path('point_cloud_data.pcd') points = np.zeros([pcd_data.width, 4], dtype=np.float32) points[:, 0] = pcd_data.pc_data['x'].copy() points[:, 1] = pcd_data.pc_data['y'].copy() points[:, 2] = pcd_data.pc_data['z'].copy() points[:, 3] = pcd_data.pc_data['intensity'].copy().astype(np.float32) with open('point_cloud_data.bin', 'wb') as f: f.write(points.tobytes())
Convert
.las
to.bin
: The common conversion path is.las -> .pcd -> .bin
, and the conversion path.las -> .pcd
can be achieved through this tool.
Label Format¶
The most basic information: 3D bounding box and category label of each scene need to be contained in the .txt
annotation file. Each line represents a 3D box in a certain scene as follow:
# format: [x, y, z, dx, dy, dz, yaw, category_name]
1.23 1.42 0.23 3.96 1.65 1.55 1.56 Car
3.51 2.15 0.42 1.05 0.87 1.86 1.23 Pedestrian
...
Note: Currently we only support KITTI Metric evaluation for customized datasets evaluation.
The 3D Box should be stored in unified 3D coordinates.
Calibration Format¶
For the point cloud data collected by each LiDAR, they are usually fused and converted to a certain LiDAR coordinate. So typically the calibration information file should contain the intrinsic matrix of each camera and the transformation extrinsic matrix from the LiDAR to each camera in .txt
calibration file, while Px
represents the intrinsic matrix of camera_x
and lidar2camx
represents the transformation extrinsic matrix from the lidar
to camera_x
.
P0
P1
P2
P3
P4
...
lidar2cam0
lidar2cam1
lidar2cam2
lidar2cam3
lidar2cam4
...
Raw Data Structure¶
LiDAR-Based 3D Detection¶
The raw data for LiDAR-based 3D object detection are typically organized as follows, where ImageSets
contains split files indicating which files belong to training/validation set, points
includes point cloud data which are supposed to be stored in .bin
format and labels
includes label files for 3D detection.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
Vision-Based 3D Detection¶
The raw data for vision-based 3D object detection are typically organized as follows, where ImageSets
contains split files indicating which files belong to training/validation set, images
contains the images from different cameras, for example, images from camera_x
need to be placed in images/images_x
, calibs
contains calibration information files which store the camera intrinsic matrix of each camera, and labels
includes label files for 3D detection.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── calibs
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
│ │ ├── images
│ │ │ ├── images_0
│ │ │ │ ├── 000000.png
│ │ │ │ ├── 000001.png
│ │ │ │ ├── ...
│ │ │ ├── images_1
│ │ │ ├── images_2
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
Multi-Modality 3D Detection¶
The raw data for multi-modality 3D object detection are typically organized as follows. Different from vision-based 3D object detection, calibration information files in calibs
store the camera intrinsic matrix of each camera and extrinsic matrix.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── calibs
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── images
│ │ │ ├── images_0
│ │ │ │ ├── 000000.png
│ │ │ │ ├── 000001.png
│ │ │ │ ├── ...
│ │ │ ├── images_1
│ │ │ ├── images_2
│ │ │ ├── ...
│ │ ├── labels
│ │ │ ├── 000000.txt
│ │ │ ├── 000001.txt
│ │ │ ├── ...
LiDAR-Based 3D Semantic Segmentation¶
The raw data for LiDAR-based 3D semantic segmentation are typically organized as follows, where ImageSets
contains split files indicating which files belong to training/validation set, points
includes point cloud data, and semantic_mask
includes point-level label.
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── custom
│ │ ├── ImageSets
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ ├── points
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
│ │ ├── semantic_mask
│ │ │ ├── 000000.bin
│ │ │ ├── 000001.bin
│ │ │ ├── ...
Data Converter¶
Once you prepared the raw data following our instruction, you can directly use the following command to generate training/validation information files.
python tools/create_data.py custom --root-path ./data/custom --out-dir ./data/custom --extra-tag custom
An example of customized dataset¶
Once we finish data preparation, we can create a new dataset in mmdet3d/datasets/my_dataset.py
to load the data.
import mmengine
from mmdet3d.registry import DATASETS
from .det3d_dataset import Det3DDataset
@DATASETS.register_module()
class MyDataset(Det3DDataset):
# replace with all the classes in customized pkl info file
METAINFO = {
'classes': ('Pedestrian', 'Cyclist', 'Car')
}
def parse_ann_info(self, info):
"""Process the `instances` in data info to `ann_info`.
Args:
info (dict): Data information of single data sample.
Returns:
dict: Annotation information consists of the following keys:
- gt_bboxes_3d (:obj:`LiDARInstance3DBoxes`):
3D ground truth bboxes.
- gt_labels_3d (np.ndarray): Labels of ground truths.
"""
ann_info = super().parse_ann_info(info)
if ann_info is None:
ann_info = dict()
# empty instance
ann_info['gt_bboxes_3d'] = np.zeros((0, 7), dtype=np.float32)
ann_info['gt_labels_3d'] = np.zeros(0, dtype=np.int64)
# filter the gt classes not used in training
ann_info = self._remove_dontcare(ann_info)
gt_bboxes_3d = LiDARInstance3DBoxes(ann_info['gt_bboxes_3d'])
ann_info['gt_bboxes_3d'] = gt_bboxes_3d
return ann_info
After the data pre-processing, there are two steps for users to train the customized new dataset:
Modify the config file for using the customized dataset.
Check the annotations of the customized dataset.
Here we take training PointPillars on customized dataset as an example:
Prepare a config¶
Here we demonstrate a config sample for pure point cloud training.
Prepare dataset config¶
In configs/_base_/datasets/custom.py
:
# dataset settings
dataset_type = 'MyDataset'
data_root = 'data/custom/'
class_names = ['Pedestrian', 'Cyclist', 'Car'] # replace with your dataset class
point_cloud_range = [0, -40, -3, 70.4, 40, 1] # adjust according to your dataset
input_modality = dict(use_lidar=True, use_camera=False)
metainfo = dict(classes=class_names)
train_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4, # replace with your point cloud data dimension
use_dim=4), # replace with the actual dimension used in training and inference
dict(
type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[1.0, 1.0, 0.5],
global_rot_range=[0.0, 0.0],
rot_range=[-0.78539816, 0.78539816]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(
type='Pack3DDetInputs',
keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=4, # replace with your point cloud data dimension
use_dim=4),
dict(type='Pack3DDetInputs', keys=['points'])
]
# construct a pipeline for data and gt loading in show function
eval_pipeline = [
dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
dict(type='Pack3DDetInputs', keys=['points']),
]
train_dataloader = dict(
batch_size=6,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='RepeatDataset',
times=2,
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='custom_infos_train.pkl', # specify your training pkl info
data_prefix=dict(pts='points'),
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
metainfo=metainfo,
box_type_3d='LiDAR')))
val_dataloader = dict(
batch_size=1,
num_workers=1,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(pts='points'),
ann_file='custom_infos_val.pkl', # specify your validation pkl info
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
metainfo=metainfo,
box_type_3d='LiDAR'))
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'custom_infos_val.pkl', # specify your validation pkl info
metric='bbox')
Prepare model config¶
For voxel-based detectors such as SECOND, PointPillars and CenterPoint, the point cloud range and voxel size should be adjusted according to your dataset.
Theoretically, voxel_size
is linked to the setting of point_cloud_range
. Setting a smaller voxel_size
will increase the voxel num and the corresponding memory consumption. In addition, the following issues need to be noted:
If the point_cloud_range
and voxel_size
are set to be [0, -40, -3, 70.4, 40, 1]
and [0.05, 0.05, 0.1]
respectively, then the shape of intermediate feature map should be [(1-(-3))/0.1+1, (40-(-40))/0.05, (70.4-0)/0.05]=[41, 1600, 1408]
. When changing point_cloud_range
, remember to change the shape of intermediate feature map in middle_encoder
according to the voxel_size
.
Regarding the setting of anchor_range
, it is generally adjusted according to dataset. Note that z
value needs to be adjusted accordingly to the position of the point cloud, please refer to this issue.
Regarding the setting of anchor_size
, it is usually necessary to count the average length, width and height of objects in the entire training dataset as anchor_size
to obtain the best results.
In configs/_base_/models/pointpillars_hv_secfpn_custom.py
:
voxel_size = [0.16, 0.16, 4] # adjust according to your dataset
point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1] # adjust according to your dataset
model = dict(
type='VoxelNet',
data_preprocessor=dict(
type='Det3DDataPreprocessor',
voxel=True,
voxel_layer=dict(
max_num_points=32,
point_cloud_range=point_cloud_range,
voxel_size=voxel_size,
max_voxels=(16000, 40000))),
voxel_encoder=dict(
type='PillarFeatureNet',
in_channels=4,
feat_channels=[64],
with_distance=False,
voxel_size=voxel_size,
point_cloud_range=point_cloud_range),
# the `output_shape` should be adjusted according to `point_cloud_range`
# and `voxel_size`
middle_encoder=dict(
type='PointPillarsScatter', in_channels=64, output_shape=[496, 432]),
backbone=dict(
type='SECOND',
in_channels=64,
layer_nums=[3, 5, 5],
layer_strides=[2, 2, 2],
out_channels=[64, 128, 256]),
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
bbox_head=dict(
type='Anchor3DHead',
num_classes=3,
in_channels=384,
feat_channels=384,
use_direction_classifier=True,
assign_per_class=True,
# adjust the `ranges` and `sizes` according to your dataset
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -0.6, 69.12, 39.68, -0.6],
[0, -39.68, -1.78, 69.12, 39.68, -1.78],
],
sizes=[[0.8, 0.6, 1.73], [1.76, 0.6, 1.73], [3.9, 1.6, 1.56]],
rotations=[0, 1.57],
reshape_out=False),
diff_rad_by_sin=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='mmdet.SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
loss_dir=dict(
type='mmdet.CrossEntropyLoss', use_sigmoid=False,
loss_weight=0.2)),
# model training and testing settings
train_cfg=dict(
assigner=[
dict( # for Pedestrian
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict( # for Cyclist
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.5,
neg_iou_thr=0.35,
min_pos_iou=0.35,
ignore_iof_thr=-1),
dict( # for Car
type='Max3DIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1),
],
allowed_border=0,
pos_weight=-1,
debug=False),
test_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_thr=0.01,
score_thr=0.1,
min_bbox_size=0,
nms_pre=100,
max_num=50))
Prepare overall config¶
We combine all the configs above in configs/pointpillars/pointpillars_hv_secfpn_8xb6_custom.py
:
_base_ = [
'../_base_/models/pointpillars_hv_secfpn_custom.py',
'../_base_/datasets/custom.py',
'../_base_/schedules/cyclic-40e.py', '../_base_/default_runtime.py'
]
Visualize your dataset (optional)¶
To validate whether your prepared data and config are correct, it’s highly recommended to use tools/misc/browse_dataset.py
script
to visualize your dataset and annotations before training and validation. Please refer to visualization doc for more details.
Evaluation¶
Once the data and config have been prepared, you can directly run the training/testing script following our doc.
Note: We only provide an implementation for KITTI style evaluation for the customized dataset. It should be included in the dataset config:
val_evaluator = dict(
type='KittiMetric',
ann_file=data_root + 'custom_infos_val.pkl', # specify your validation pkl info
metric='bbox')
Customize Models¶
We basically categorize model components into 6 types:
encoder: Including voxel encoder and middle encoder used in voxel-based methods before backbone, e.g.,
HardVFE
andPointPillarsScatter
.backbone: Usually an FCN network to extract feature maps, e.g.,
ResNet
,SECOND
.neck: The component between backbones and heads, e.g.,
FPN
,SECONDFPN
.head: The component for specific tasks, e.g.,
bbox prediction
andmask prediction
.RoI extractor: The part for extracting RoI features from feature maps, e.g.,
H3DRoIHead
andPartAggregationROIHead
.loss: The component in heads for calculating losses, e.g.,
FocalLoss
,L1Loss
, andGHMLoss
.
Develop new components¶
Add a new encoder¶
Here we show how to develop new components with an example of HardVFE.
1. Define a new voxel encoder (e.g. HardVFE: Voxel feature encoder used in HV-SECOND)¶
Create a new file mmdet3d/models/voxel_encoders/voxel_encoder.py
.
import torch.nn as nn
from mmdet3d.registry import MODELS
@MODELS.register_module()
class HardVFE(nn.Module):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # should return a tuple
pass
2. Import the module¶
You can either add the following line to mmdet3d/models/voxel_encoders/__init__.py
:
from .voxel_encoder import HardVFE
or alternatively add
custom_imports = dict(
imports=['mmdet3d.models.voxel_encoders.voxel_encoder'],
allow_failed_imports=False)
to the config file to avoid modifying the original code.
3. Use the voxel encoder in your config file¶
model = dict(
...
voxel_encoder=dict(
type='HardVFE',
arg1=xxx,
arg2=yyy),
...
)
Add a new backbone¶
Here we show how to develop new components with an example of SECOND (Sparsely Embedded Convolutional Detection).
1. Define a new backbone (e.g. SECOND)¶
Create a new file mmdet3d/models/backbones/second.py
.
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class SECOND(BaseModule):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # should return a tuple
pass
2. Import the module¶
You can either add the following line to mmdet3d/models/backbones/__init__.py
:
from .second import SECOND
or alternatively add
custom_imports = dict(
imports=['mmdet3d.models.backbones.second'],
allow_failed_imports=False)
to the config file to avoid modifying the original code.
3. Use the backbone in your config file¶
model = dict(
...
backbone=dict(
type='SECOND',
arg1=xxx,
arg2=yyy),
...
)
Add a new neck¶
1. Define a new neck (e.g. SECONDFPN)¶
Create a new file mmdet3d/models/necks/second_fpn.py
.
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class SECONDFPN(BaseModule):
def __init__(self,
in_channels=[128, 128, 256],
out_channels=[256, 256, 256],
upsample_strides=[1, 2, 4],
norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01),
upsample_cfg=dict(type='deconv', bias=False),
conv_cfg=dict(type='Conv2d', bias=False),
use_conv_for_no_stride=False,
init_cfg=None):
pass
def forward(self, x):
# implementation is ignored
pass
2. Import the module¶
You can either add the following line to mmdet3d/models/necks/__init__.py
:
from .second_fpn import SECONDFPN
or alternatively add
custom_imports = dict(
imports=['mmdet3d.models.necks.second_fpn'],
allow_failed_imports=False)
to the config file to avoid modifying the original code.
3. Use the neck in your config file¶
model = dict(
...
neck=dict(
type='SECONDFPN',
in_channels=[64, 128, 256],
upsample_strides=[1, 2, 4],
out_channels=[128, 128, 128]),
...
)
Add a new head¶
Here we show how to develop a new head with the example of PartA2 Head as the following.
Note: Here the example of PartA2 RoI Head
is used in the second stage. For one-stage heads, please refer to examples in mmdet3d/models/dense_heads/
. They are more commonly used in 3D detection for autonomous driving due to its simplicity and high efficiency.
First, add a new bbox head in mmdet3d/models/roi_heads/bbox_heads/parta2_bbox_head.py
.
PartA2 RoI Head
implements a new bbox head for object detection.
To implement a bbox head, basically we need to implement two functions of the new module as the following. Sometimes other related functions like loss
and get_targets
are also required.
from mmengine.model import BaseModule
from mmdet3d.registry import MODELS
@MODELS.register_module()
class PartA2BboxHead(BaseModule):
"""PartA2 RoI head."""
def __init__(self,
num_classes,
seg_in_channels,
part_in_channels,
seg_conv_channels=None,
part_conv_channels=None,
merge_conv_channels=None,
down_conv_channels=None,
shared_fc_channels=None,
cls_channels=None,
reg_channels=None,
dropout_ratio=0.1,
roi_feat_size=14,
with_corner_loss=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01),
loss_bbox=dict(
type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=True,
reduction='none',
loss_weight=1.0),
init_cfg=None):
super(PartA2BboxHead, self).__init__(init_cfg=init_cfg)
def forward(self, seg_feats, part_feats):
pass
Second, implement a new RoI Head if it is necessary. We plan to inherit the new PartAggregationROIHead
from Base3DRoIHead
. We can find that a Base3DRoIHead
already implements the following functions.
from mmdet.models.roi_heads import BaseRoIHead
from mmdet3d.registry import MODELS, TASK_UTILS
class Base3DRoIHead(BaseRoIHead):
"""Base class for 3d RoIHeads."""
def __init__(self,
bbox_head=None,
bbox_roi_extractor=None,
mask_head=None,
mask_roi_extractor=None,
train_cfg=None,
test_cfg=None,
init_cfg=None):
super(Base3DRoIHead, self).__init__(
bbox_head=bbox_head,
bbox_roi_extractor=bbox_roi_extractor,
mask_head=mask_head,
mask_roi_extractor=mask_roi_extractor,
train_cfg=train_cfg,
test_cfg=test_cfg,
init_cfg=init_cfg)
def init_bbox_head(self, bbox_roi_extractor: dict,
bbox_head: dict) -> None:
"""Initialize box head and box roi extractor.
Args:
bbox_roi_extractor (dict or ConfigDict): Config of box
roi extractor.
bbox_head (dict or ConfigDict): Config of box in box head.
"""
self.bbox_roi_extractor = MODELS.build(bbox_roi_extractor)
self.bbox_head = MODELS.build(bbox_head)
def init_assigner_sampler(self):
"""Initialize assigner and sampler."""
self.bbox_assigner = None
self.bbox_sampler = None
if self.train_cfg:
if isinstance(self.train_cfg.assigner, dict):
self.bbox_assigner = TASK_UTILS.build(self.train_cfg.assigner)
elif isinstance(self.train_cfg.assigner, list):
self.bbox_assigner = [
TASK_UTILS.build(res) for res in self.train_cfg.assigner
]
self.bbox_sampler = TASK_UTILS.build(self.train_cfg.sampler)
def init_mask_head(self):
"""Initialize mask head, skip since ``PartAggregationROIHead`` does not
have one."""
pass
Double Head’s modification is mainly in the bbox_forward logic, and it inherits other logics from the Base3DRoIHead
.
In the mmdet3d/models/roi_heads/part_aggregation_roi_head.py
, we implement the new RoI Head as the following:
from typing import Dict, List, Tuple
from mmdet.models.task_modules import AssignResult, SamplingResult
from mmengine import ConfigDict
from torch import Tensor
from torch.nn import functional as F
from mmdet3d.registry import MODELS
from mmdet3d.structures import bbox3d2roi
from mmdet3d.utils import InstanceList
from ...structures.det3d_data_sample import SampleList
from .base_3droi_head import Base3DRoIHead
@MODELS.register_module()
class PartAggregationROIHead(Base3DRoIHead):
"""Part aggregation roi head for PartA2.
Args:
semantic_head (ConfigDict): Config of semantic head.
num_classes (int): The number of classes.
seg_roi_extractor (ConfigDict): Config of seg_roi_extractor.
bbox_roi_extractor (ConfigDict): Config of part_roi_extractor.
bbox_head (ConfigDict): Config of bbox_head.
train_cfg (ConfigDict): Training config.
test_cfg (ConfigDict): Testing config.
"""
def __init__(self,
semantic_head: dict,
num_classes: int = 3,
seg_roi_extractor: dict = None,
bbox_head: dict = None,
bbox_roi_extractor: dict = None,
train_cfg: dict = None,
test_cfg: dict = None,
init_cfg: dict = None) -> None:
super(PartAggregationROIHead, self).__init__(
bbox_head=bbox_head,
bbox_roi_extractor=bbox_roi_extractor,
train_cfg=train_cfg,
test_cfg=test_cfg,
init_cfg=init_cfg)
self.num_classes = num_classes
assert semantic_head is not None
self.init_seg_head(seg_roi_extractor, semantic_head)
def init_seg_head(self, seg_roi_extractor: dict,
semantic_head: dict) -> None:
"""Initialize semantic head and seg roi extractor.
Args:
seg_roi_extractor (dict): Config of seg
roi extractor.
semantic_head (dict): Config of semantic head.
"""
self.semantic_head = MODELS.build(semantic_head)
self.seg_roi_extractor = MODELS.build(seg_roi_extractor)
@property
def with_semantic(self):
"""bool: whether the head has semantic branch"""
return hasattr(self,
'semantic_head') and self.semantic_head is not None
def predict(self,
feats_dict: Dict,
rpn_results_list: InstanceList,
batch_data_samples: SampleList,
rescale: bool = False,
**kwargs) -> InstanceList:
"""Perform forward propagation of the roi head and predict detection
results on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
samples. It usually includes information such as
`gt_instance_3d`, `gt_panoptic_seg_3d` and `gt_sem_seg_3d`.
rescale (bool): If True, return boxes in original image space.
Defaults to False.
Returns:
list[:obj:`InstanceData`]: Detection results of each sample
after the post process.
Each item usually contains following keys.
- scores_3d (Tensor): Classification scores, has a shape
(num_instances, )
- labels_3d (Tensor): Labels of bboxes, has a shape
(num_instances, ).
- bboxes_3d (BaseInstance3DBoxes): Prediction of bboxes,
contains a tensor with shape (num_instances, C), where
C >= 7.
"""
assert self.with_bbox, 'Bbox head must be implemented in PartA2.'
assert self.with_semantic, 'Semantic head must be implemented' \
' in PartA2.'
batch_input_metas = [
data_samples.metainfo for data_samples in batch_data_samples
]
voxels_dict = feats_dict.pop('voxels_dict')
# TODO: Split predict semantic and bbox
results_list = self.predict_bbox(feats_dict, voxels_dict,
batch_input_metas, rpn_results_list,
self.test_cfg)
return results_list
def predict_bbox(self, feats_dict: Dict, voxel_dict: Dict,
batch_input_metas: List[dict],
rpn_results_list: InstanceList,
test_cfg: ConfigDict) -> InstanceList:
"""Perform forward propagation of the bbox head and predict detection
results on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
voxel_dict (dict): Contains information of voxels.
batch_input_metas (list[dict], Optional): Batch image meta info.
Defaults to None.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
test_cfg (Config): Test config.
Returns:
list[:obj:`InstanceData`]: Detection results of each sample
after the post process.
Each item usually contains following keys.
- scores_3d (Tensor): Classification scores, has a shape
(num_instances, )
- labels_3d (Tensor): Labels of bboxes, has a shape
(num_instances, ).
- bboxes_3d (BaseInstance3DBoxes): Prediction of bboxes,
contains a tensor with shape (num_instances, C), where
C >= 7.
"""
...
def loss(self, feats_dict: Dict, rpn_results_list: InstanceList,
batch_data_samples: SampleList, **kwargs) -> dict:
"""Perform forward propagation and loss calculation of the detection
roi on the features of the upstream network.
Args:
feats_dict (dict): Contains features from the first stage.
rpn_results_list (List[:obj:`InstanceData`]): Detection results
of rpn head.
batch_data_samples (List[:obj:`Det3DDataSample`]): The Data
samples. It usually includes information such as
`gt_instance_3d`, `gt_panoptic_seg_3d` and `gt_sem_seg_3d`.
Returns:
dict[str, Tensor]: A dictionary of loss components
"""
assert len(rpn_results_list) == len(batch_data_samples)
losses = dict()
batch_gt_instances_3d = []
batch_gt_instances_ignore = []
voxels_dict = feats_dict.pop('voxels_dict')
for data_sample in batch_data_samples:
batch_gt_instances_3d.append(data_sample.gt_instances_3d)
if 'ignored_instances' in data_sample:
batch_gt_instances_ignore.append(data_sample.ignored_instances)
else:
batch_gt_instances_ignore.append(None)
if self.with_semantic:
semantic_results = self._semantic_forward_train(
feats_dict, voxels_dict, batch_gt_instances_3d)
losses.update(semantic_results.pop('loss_semantic'))
sample_results = self._assign_and_sample(rpn_results_list,
batch_gt_instances_3d)
if self.with_bbox:
feats_dict.update(semantic_results)
bbox_results = self._bbox_forward_train(feats_dict, voxels_dict,
sample_results)
losses.update(bbox_results['loss_bbox'])
return losses
Here we omit more details related to other functions. Please see the code for more details.
Last, the users need to add the module in
mmdet3d/models/roi_heads/bbox_heads/__init__.py
and mmdet3d/models/roi_heads/__init__.py
thus the corresponding registry could find and load them.
Alternatively, the users can add
custom_imports=dict(
imports=['mmdet3d.models.roi_heads.part_aggregation_roi_head', 'mmdet3d.models.roi_heads.bbox_heads.parta2_bbox_head'],
allow_failed_imports=False)
to the config file and achieve the same goal.
The config file of PartAggregationROIHead
is as the following:
model = dict(
...
roi_head=dict(
type='PartAggregationROIHead',
num_classes=3,
semantic_head=dict(
type='PointwiseSemanticHead',
in_channels=16,
extra_width=0.2,
seg_score_thr=0.3,
num_classes=3,
loss_seg=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
reduction='sum',
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_part=dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
loss_weight=1.0)),
seg_roi_extractor=dict(
type='Single3DRoIAwareExtractor',
roi_layer=dict(
type='RoIAwarePool3d',
out_size=14,
max_pts_per_voxel=128,
mode='max')),
bbox_roi_extractor=dict(
type='Single3DRoIAwareExtractor',
roi_layer=dict(
type='RoIAwarePool3d',
out_size=14,
max_pts_per_voxel=128,
mode='avg')),
bbox_head=dict(
type='PartA2BboxHead',
num_classes=3,
seg_in_channels=16,
part_in_channels=4,
seg_conv_channels=[64, 64],
part_conv_channels=[64, 64],
merge_conv_channels=[128, 128],
down_conv_channels=[128, 256],
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
shared_fc_channels=[256, 512, 512, 512],
cls_channels=[256, 256],
reg_channels=[256, 256],
dropout_ratio=0.1,
roi_feat_size=14,
with_corner_loss=True,
loss_bbox=dict(
type='mmdet.SmoothL1Loss',
beta=1.0 / 9.0,
reduction='sum',
loss_weight=1.0),
loss_cls=dict(
type='mmdet.CrossEntropyLoss',
use_sigmoid=True,
reduction='sum',
loss_weight=1.0))),
...
)
Since MMDetection 2.0, the config system supports to inherit configs such that the users can focus on the modification.
The second stage of PartA2 Head mainly uses a new PartAggregationROIHead
and a new
PartA2BboxHead
, the arguments are set according to the __init__
function of each module.
Add a new loss¶
Assume you want to add a new loss as MyLoss
for bounding box regression.
To add a new loss function, the users need to implement it in mmdet3d/models/losses/my_loss.py
.
The decorator weighted_loss
enables the loss to be weighted for each element.
import torch
import torch.nn as nn
from mmdet.models.losses.utils import weighted_loss
from mmdet3d.registry import MODELS
@weighted_loss
def my_loss(pred, target):
assert pred.size() == target.size() and target.numel() > 0
loss = torch.abs(pred - target)
return loss
@MODELS.register_module()
class MyLoss(nn.Module):
def __init__(self, reduction='mean', loss_weight=1.0):
super(MyLoss, self).__init__()
self.reduction = reduction
self.loss_weight = loss_weight
def forward(self,
pred,
target,
weight=None,
avg_factor=None,
reduction_override=None):
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
loss_bbox = self.loss_weight * my_loss(
pred, target, weight, reduction=reduction, avg_factor=avg_factor)
return loss_bbox
Then the users need to add it in the mmdet3d/models/losses/__init__.py
.
from .my_loss import MyLoss, my_loss
Alternatively, you can add
custom_imports=dict(
imports=['mmdet3d.models.losses.my_loss'],
allow_failed_imports=False)
to the config file and achieve the same goal.
To use it, users should modify the loss_xxx
field.
Since MyLoss
is for regression, you need to modify the loss_bbox
field in the head.
loss_bbox=dict(type='MyLoss', loss_weight=1.0)
Customize Runtime Settings¶
Customize optimization settings¶
Optimization related configuration is now all managed by optim_wrapper
which usually has three fields: optimizer
, paramwise_cfg
, clip_grad
. Please refer to OptimWrapper for more details. See the example below, where AdamW
is used as an optimizer
, the learning rate of the backbone is reduced by a factor of 10, and gradient clipping is added.
optim_wrapper = dict(
type='OptimWrapper',
# optimizer
optimizer=dict(
type='AdamW',
lr=0.0001,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999)),
# Parameter-level learning rate and weight decay settings
paramwise_cfg=dict(
custom_keys={
'backbone': dict(lr_mult=0.1, decay_mult=1.0),
},
norm_decay_mult=0.0),
# gradient clipping
clip_grad=dict(max_norm=0.01, norm_type=2))
Customize optimizer supported by PyTorch¶
We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the optimizer
field in optim_wrapper
field of config files. For example, if you want to use Adam
(note that the performance could drop a lot), the modification could be as the following:
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='Adam', lr=0.0003, weight_decay=0.0001))
To modify the learning rate of the model, the users only need to modify the lr
in optimizer
. The users can directly set arguments following the API doc of PyTorch.
Customize self-implemented optimizer¶
1. Define a new optimizer¶
A customized optimizer could be defined as following:
Assume you want to add a optimizer named MyOptimizer
, which has arguments a
, b
, and c
.
You need to create a new directory named mmdet3d/engine/optimizers
, and then implement the new optimizer in a file, e.g., in mmdet3d/engine/optimizers/my_optimizer.py
:
from torch.optim import Optimizer
from mmdet3d.registry import OPTIMIZERS
@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):
def __init__(self, a, b, c):
pass
2. Add the optimizer to registry¶
To find the above module defined above, this module should be imported into the main namespace at first. There are two options to achieve it.
Modify
mmdet3d/engine/optimizers/__init__.py
to import it.The newly defined module should be imported in
mmdet3d/engine/optimizers/__init__.py
so that the registry will find the new module and add it:from .my_optimizer import MyOptimizer
Use
custom_imports
in the config to manually import it.custom_imports = dict(imports=['mmdet3d.engine.optimizers.my_optimizer'], allow_failed_imports=False)
The module
mmdet3d.engine.optimizers.my_optimizer
will be imported at the beginning of the program and the classMyOptimizer
is then automatically registered. Note that only the package containing the classMyOptimizer
should be imported.mmdet3d.engine.optimizers.my_optimizer.MyOptimizer
cannot be imported directly.Actually users can use a totally different file directory structure with this importing method, as long as the module root is located in
PYTHONPATH
.
3. Specify the optimizer in the config file¶
Then you can use MyOptimizer
in optimizer
field in optim_wrapper
field of config files. In the configs, the optimizers are defined by the field optimizer
like the following:
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001))
To use your own optimizer, the field can be changed to:
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value))
Customize optimizer wrapper constructor¶
Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers. The users can do those fine-grained parameter tuning through customizing optimizer wrapper constructor.
from mmengine.optim import DefaultOptimWrapperConstructor
from mmdet3d.registry import OPTIM_WRAPPER_CONSTRUCTORS
from .my_optimizer import MyOptimizer
@OPTIM_WRAPPER_CONSTRUCTORS.register_module()
class MyOptimizerWrapperConstructor(DefaultOptimWrapperConstructor):
def __init__(self,
optim_wrapper_cfg: dict,
paramwise_cfg: Optional[dict] = None):
pass
def __call__(self, model: nn.Module) -> OptimWrapper:
return optim_wrapper
The default optimizer wrapper constructor is implemented here, which could also serve as a template for the new optimizer wrapper constructor.
Additional settings¶
Tricks not implemented by the optimizer should be implemented through optimizer wrapper constructor (e.g., set parameter-wise learning rates) or hooks. We list some common settings that could stabilize the training or accelerate the training. Feel free to create PR, issue for more settings.
Use gradient clip to stabilize training: Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below:
optim_wrapper = dict( _delete_=True, clip_grad=dict(max_norm=35, norm_type=2))
If your config inherits the base config which already sets the
optim_wrapper
, you might need_delete_=True
to override the unnecessary settings. See the config documentation for more details.Use momentum schedule to accelerate model convergence: We support momentum scheduler to modify model’s momentum according to learning rate, which could make the model converge in a faster way. Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence. For more details, please refer to the implementation of CosineAnnealingLR and CosineAnnealingMomentum.
param_scheduler = [ # learning rate scheduler # During the first 8 epochs, learning rate increases from 0 to lr * 10 # during the next 12 epochs, learning rate decreases from lr * 10 to lr * 1e-4 dict( type='CosineAnnealingLR', T_max=8, eta_min=lr * 10, begin=0, end=8, by_epoch=True, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=12, eta_min=lr * 1e-4, begin=8, end=20, by_epoch=True, convert_to_iter_based=True), # momentum scheduler # During the first 8 epochs, momentum increases from 0 to 0.85 / 0.95 # during the next 12 epochs, momentum increases from 0.85 / 0.95 to 1 dict( type='CosineAnnealingMomentum', T_max=8, eta_min=0.85 / 0.95, begin=0, end=8, by_epoch=True, convert_to_iter_based=True), dict( type='CosineAnnealingMomentum', T_max=12, eta_min=1, begin=8, end=20, by_epoch=True, convert_to_iter_based=True) ]
Customize training schedules¶
By default we use step learning rate with 1x schedule, this calls MultiStepLR
in MMEngine.
We support many other learning rate schedule here, such as CosineAnnealingLR
and PolyLR
schedules. Here are some examples:
Poly schedule:
param_scheduler = [ dict( type='PolyLR', power=0.9, eta_min=1e-4, begin=0, end=8, by_epoch=True)]
CosineAnnealing schedule:
param_scheduler = [ dict( type='CosineAnnealingLR', T_max=8, eta_min=lr * 1e-5, begin=0, end=8, by_epoch=True)]
Customize train loop¶
By default, EpochBasedTrainLoop
is used in train_cfg
and validation is done after every train epoch, as follows:
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_begin=1, val_interval=1)
Actually, both IterBasedTrainLoop
and EpochBasedTrainLoop
support dynamic interval, see the following example:
# Before 365001th iteration, we do evaluation every 5000 iterations.
# After 365000th iteration, we do evaluation every 368750 iterations,
# which means that we do evaluation at the end of training.
interval = 5000
max_iters = 368750
dynamic_intervals = [(max_iters // interval * interval + 1, max_iters)]
train_cfg = dict(
type='IterBasedTrainLoop',
max_iters=max_iters,
val_interval=interval,
dynamic_intervals=dynamic_intervals)
Customize hooks¶
Customize self-implemented hooks¶
1. Implement a new hook¶
MMEngine provides many useful hooks, but there are some occasions when the users might need to implement a new hook. MMDetection3D supports customized hooks in training based on MMEngine after v1.1.0rc0. Thus the users could implement a hook directly in mmdet3d or their mmdet3d-based codebases and use the hook by only modifying the config in training. Here we give an example of creating a new hook in mmdet3d and using it in training.
from mmengine.hooks import Hook
from mmdet3d.registry import HOOKS
@HOOKS.register_module()
class MyHook(Hook):
def __init__(self, a, b):
def before_run(self, runner) -> None:
def after_run(self, runner) -> None:
def before_train(self, runner) -> None:
def after_train(self, runner) -> None:
def before_train_epoch(self, runner) -> None:
def after_train_epoch(self, runner) -> None:
def before_train_iter(self,
runner,
batch_idx: int,
data_batch: DATA_BATCH = None) -> None:
def after_train_iter(self,
runner,
batch_idx: int,
data_batch: DATA_BATCH = None,
outputs: Optional[dict] = None) -> None:
Depending on the functionality of the hook, users need to specify what the hook will do at each stage of the training in before_run
, after_run
, before_train
, after_train
, before_train_epoch
, after_train_epoch
, before_train_iter
, and after_train_iter
. There are more points where hooks can be inserted, refer to base hook class for more details.
2. Register the new hook¶
Then we need to make MyHook
imported. Assuming the file is in mmdet3d/engine/hooks/my_hook.py
, there are two ways to do that:
Modify
mmdet3d/engine/hooks/__init__.py
to import it.The newly defined module should be imported in
mmdet3d/engine/hooks/__init__.py
so that the registry will find the new module and add it:from .my_hook import MyHook
Use
custom_imports
in the config to manually import it.custom_imports = dict(imports=['mmdet3d.engine.hooks.my_hook'], allow_failed_imports=False)
3. Modify the config¶
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value)
]
You can also set the priority of the hook by adding key priority
to 'NORMAL'
or 'HIGHEST'
as below:
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL')
]
By default the hook’s priority is set as NORMAL
during registration.
Use hooks implemented in MMDetection3D¶
If the hook is already implemented in MMDetection3D, you can directly modify the config to use the hook as below.
Example: DisableObjectSampleHook
¶
We implement a customized hook named DisableObjectSampleHook to disable ObjectSample
augmentation during training after specified epoch.
We can set it in the config file if needed:
custom_hooks = [dict(type='DisableObjectSampleHook', disable_after_epoch=15)]
Modify default runtime hooks¶
There are some common hooks that are registered through default_hooks
, they are
IterTimerHook
: A hook that logs ‘data_time’ for loading data and ‘time’ for a model training step.LoggerHook
: A hook that collects logs from different components ofRunner
and writes them to terminal, json file, tensorboard and wandb etc.ParamSchedulerHook
: A hook that updates some hyper-parameters in optimizer, e.g., learning rate and momentum.CheckpointHook
: A hook that saves checkpoints periodically.DistSamplerSeedHook
: A hook that sets the seed for sampler and batch_sampler.Det3DVisualizationHook
: A hook used to visualize validation and testing process prediction results.
IterTimerHook
, ParamSchedulerHook
and DistSamplerSeedHook
are simple and no need to be modified usually, so here we reveal what we can do with LoggerHook
, CheckpointHook
and Det3DVisualizationHook
.
CheckpointHook¶
Except saving checkpoints periodically, CheckpointHook
provides other options such as max_keep_ckpts
, save_optimizer
and etc. The users could set max_keep_ckpts
to only save small number of checkpoints or decide whether to store state dict of optimizer by save_optimizer
. More details of the arguments are here.
default_hooks = dict(
checkpoint=dict(
type='CheckpointHook',
interval=1,
max_keep_ckpts=3,
save_optimizer=True))
LoggerHook¶
The LoggerHook
enables setting intervals. Detailed instructions can be found in the docstring.
default_hooks = dict(logger=dict(type='LoggerHook', interval=50))
Det3DVisualizationHook¶
Det3DVisualizationHook
use DetLocalVisualizer
to visualize prediction results, and Det3DLocalVisualizer
current supports different backends, e.g., TensorboardVisBackend
and WandbVisBackend
(see docstring for more details). The users could add multi backends to do visualization as follows.
default_hooks = dict(
visualization=dict(type='Det3DVisualizationHook', draw=True))
vis_backends = [dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')]
visualizer = dict(
type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
Along with the release of OpenMMLab 2.0, MMDetection3D (namely MMDet3D) 1.1 made many significant changes, resulting in less redundant, more efficient code and a more consistent overall design. These changes break backward compatibility. Therefore, we prepared this migration guide to make the transition as smooth as possible so that all users can enjoy the productivity benefits of the new MMDet3D and the entire OpenMMLab 2.0 ecosystem.
Environment¶
MMDet3D 1.1 depends on the new foundational library MMEngine for training deep learning models, and therefore has an entirely different dependency chain compared with MMDet3D 1.0. Even if you have a well-rounded MMDet3D 1.0 / 0.x environment before, you still need to create a new Python environment for MMDet3D 1.1. We provide a detailed installation guide for reference.
The configuration files in our new version have a lot of modifications because of the differences between MMCV 1.x and MMEngine. The guides for migration from MMCV to MMEngine can be seen here.
We have renamed the names of the remote branches in MMDet3D 1.1 (renaming 1.1 to main, master to 1.0, and dev to dev-1.0). If your local branches in the git system are not aligned with branches of the remote repo, you can use the following commands to resolve it:
git fetch origin
git checkout main
git branch main_backup # backup your main branch
git reset --hard origin/main
Dataset¶
You should update the annotation files generated in the 1.0 version since some key words and structures of annotation in MMDet3D 1.1 have changed. Taking KITTI as an example, the update script is as follows:
python tools/dataset_converters/update_infos_to_v2.py
--dataset kitti
--pkl-path ./data/kitti/kitti_infos_train.pkl
--out-dir ./kitti_v2/
If your annotation files are generated in the 0.x version, you should first update them to 1.0 version using this script. Alternatively, you can re-generate annotation files from scratch using this script.
Model¶
MMDet3D 1.1 supports loading weights trained on the old version (1.0 version). For models that are important or frequently used, we have thoroughly verified their precisions in the 1.1 version. Especially for some models that may experience potential performance drop or training bugs in the old version, such as centerpoint, we have checked them and ensured the right precision in the new version. If you encounter any problem, please feel free to raise an issue. Additionally, we have added some of the latest SOTA methods in our package and projects, making MMDet3D 1.1 a highly recommended choice for implementing your project.
mmdet3d.apis¶
mmdet3d.models¶
backbones¶
data_preprocessors¶
decode_heads¶
dense_heads¶
detectors¶
layers¶
losses¶
middle_encoders¶
necks¶
roi_heads¶
segmentors¶
task_modules¶
test_time_augs¶
utils¶
voxel_encoders¶
mmdet3d.structures¶
structures¶
bbox_3d¶
- class mmdet3d.structures.bbox_3d.BaseInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]¶
Base class for 3D Boxes.
Note
The box is bottom centered, i.e. the relative position of origin in the box is (0.5, 0.5, 0).
- Parameters
tensor (torch.Tensor | np.ndarray | list) – a N x box_dim matrix.
box_dim (int) – Number of the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw). Defaults to 7.
with_yaw (bool) – Whether the box is with yaw rotation. If False, the value of yaw will be set to 0 as minmax boxes. Defaults to True.
origin (tuple[float], optional) – Relative position of the box origin. Defaults to (0.5, 0.5, 0). This will guide the box be converted to (0.5, 0.5, 0) mode.
- tensor¶
Float matrix of N x box_dim.
- Type
torch.Tensor
- box_dim¶
Integer indicating the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw, …).
- Type
int
- with_yaw¶
If True, the value of yaw will be set to 0 as minmax boxes.
- Type
bool
- property bev¶
2D BEV box of each box with rotation in XYWHR format, in shape (N, 5).
- Type
torch.Tensor
- property bottom_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- property bottom_height¶
torch.Tensor: A vector with bottom’s height of each box in shape (N, ).
- classmethod cat(boxes_list)[source]¶
Concatenate a list of Boxes into a single Boxes.
- Parameters
boxes_list (list[
BaseInstance3DBoxes
]) – List of boxes.- Returns
The concatenated Boxes.
- Return type
- property center¶
Calculate the center of all the boxes.
Note
In MMDetection3D’s convention, the bottom center is usually taken as the default center.
The relative position of the centers in different kinds of boxes are different, e.g., the relative center of a boxes is (0.5, 1.0, 0.5) in camera and (0.5, 0.5, 0) in lidar. It is recommended to use
bottom_center
orgravity_center
for clearer usage.- Returns
A tensor with center of each box in shape (N, 3).
- Return type
torch.Tensor
- abstract convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
- The converted box of the same type
in the dst mode.
- Return type
- property corners¶
torch.Tensor: a tensor with 8 corners of each box in shape (N, 8, 3).
- property device¶
The device of the boxes are on.
- Type
str
- property dims¶
Size dimensions of each box in shape (N, 3).
- Type
torch.Tensor
- abstract flip(bev_direction='horizontal')[source]¶
Flip the boxes in BEV along given BEV direction.
- Parameters
bev_direction (str, optional) – Direction by which to flip. Can be chosen from ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.
- property gravity_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- property height¶
A vector with height of each box in shape (N, ).
- Type
torch.Tensor
- classmethod height_overlaps(boxes1, boxes2, mode='iou')[source]¶
Calculate height overlaps of two boxes.
Note
This function calculates the height overlaps between boxes1 and boxes2, boxes1 and boxes2 should be in the same type.
- Parameters
boxes1 (
BaseInstance3DBoxes
) – Boxes 1 contain N boxes.boxes2 (
BaseInstance3DBoxes
) – Boxes 2 contain M boxes.mode (str, optional) – Mode of IoU calculation. Defaults to ‘iou’.
- Returns
Calculated iou of boxes.
- Return type
torch.Tensor
- in_range_3d(box_range)[source]¶
Check whether the boxes are in the given range.
- Parameters
box_range (list | torch.Tensor) – The range of box (x_min, y_min, z_min, x_max, y_max, z_max)
Note
In the original implementation of SECOND, checking whether a box in the range checks whether the points are in a convex polygon, we try to reduce the burden for simpler cases.
- Returns
- A binary vector indicating whether each box is
inside the reference range.
- Return type
torch.Tensor
- in_range_bev(box_range)[source]¶
Check whether the boxes are in the given range.
- Parameters
box_range (list | torch.Tensor) – the range of box (x_min, y_min, x_max, y_max)
Note
The original implementation of SECOND checks whether boxes in a range by checking whether the points are in a convex polygon, we reduce the burden for simpler cases.
- Returns
Whether each box is inside the reference range.
- Return type
torch.Tensor
- limit_yaw(offset=0.5, period=3.141592653589793)[source]¶
Limit the yaw to a given period and offset.
- Parameters
offset (float, optional) – The offset of the yaw. Defaults to 0.5.
period (float, optional) – The expected period. Defaults to np.pi.
- property nearest_bev¶
A tensor of 2D BEV box of each box without rotation.
- Type
torch.Tensor
- new_box(data)[source]¶
Create a new box object with data.
- The new box and its tensor has the similar properties
as self and self.tensor, respectively.
- Parameters
data (torch.Tensor | numpy.array | list) – Data to be copied.
- Returns
- A new bbox object with
data
, the object’s other properties are similar to
self
.
- A new bbox object with
- Return type
- nonempty(threshold=0.0)[source]¶
Find boxes that are non-empty.
A box is considered empty, if either of its side is no larger than threshold.
- Parameters
threshold (float, optional) – The threshold of minimal sizes. Defaults to 0.0.
- Returns
- A binary vector which represents whether each
box is empty (False) or non-empty (True).
- Return type
torch.Tensor
- classmethod overlaps(boxes1, boxes2, mode='iou')[source]¶
Calculate 3D overlaps of two boxes.
Note
This function calculates the overlaps between
boxes1
andboxes2
,boxes1
andboxes2
should be in the same type.- Parameters
boxes1 (
BaseInstance3DBoxes
) – Boxes 1 contain N boxes.boxes2 (
BaseInstance3DBoxes
) – Boxes 2 contain M boxes.mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.
- Returns
Calculated 3D overlaps of the boxes.
- Return type
torch.Tensor
- points_in_boxes_all(points, boxes_override=None)[source]¶
Find all boxes in which each point is.
- Parameters
points (torch.Tensor) – Points in shape (1, M, 3) or (M, 3), 3 dimensions are (x, y, z) in LiDAR or depth coordinate.
boxes_override (torch.Tensor, optional) – Boxes to override self.tensor. Defaults to None.
- Returns
- A tensor indicating whether a point is in a box,
in shape (M, T). T is the number of boxes. Denote this tensor as A, if the m^th point is in the t^th box, then A[m, t] == 1, elsewise A[m, t] == 0.
- Return type
torch.Tensor
- points_in_boxes_part(points, boxes_override=None)[source]¶
Find the box in which each point is.
- Parameters
points (torch.Tensor) – Points in shape (1, M, 3) or (M, 3), 3 dimensions are (x, y, z) in LiDAR or depth coordinate.
boxes_override (torch.Tensor, optional) – Boxes to override self.tensor. Defaults to None.
- Returns
- The index of the first box that each point
is in, in shape (M, ). Default value is -1 (if the point is not enclosed by any box).
- Return type
torch.Tensor
Note
If a point is enclosed by multiple boxes, the index of the first box will be returned.
- abstract rotate(angle, points=None)[source]¶
Rotate boxes with points (optional) with the given angle or rotation matrix.
- Parameters
angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.
(torch.Tensor | numpy.ndarray | (points) –
BasePoints
, optional): Points to rotate. Defaults to None.
- scale(scale_factor)[source]¶
Scale the box with horizontal and vertical scaling factors.
- Parameters
scale_factors (float) – Scale factors to scale the boxes.
- to(device, *args, **kwargs)[source]¶
Convert current boxes to a specific device.
- Parameters
device (str |
torch.device
) – The name of the device.- Returns
- A new boxes object on the
specific device.
- Return type
- property top_height¶
torch.Tensor: A vector with the top height of each box in shape (N, ).
- translate(trans_vector)[source]¶
Translate boxes with the given translation vector.
- Parameters
trans_vector (torch.Tensor) – Translation vector of size (1, 3).
- property volume¶
A vector with volume of each box.
- Type
torch.Tensor
- property yaw¶
A vector with yaw of each box in shape (N, ).
- Type
torch.Tensor
- class mmdet3d.structures.bbox_3d.Box3DMode(value)[source]¶
Enum of different ways to represent a box.
Coordinates in LiDAR:
up z ^ x front | / | / left y <------ 0
The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.
Coordinates in camera:
z front / / 0 ------> x right | | v down y
The relative coordinate of bottom center in a CAM box is (0.5, 1.0, 0.5), and the yaw is around the y axis, thus the rotation axis=1.
Coordinates in Depth mode:
up z ^ y front | / | / 0 ------> x right
The relative coordinate of bottom center in a DEPTH box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.
- static convert(box, src, dst, rt_mat=None, with_yaw=True, correct_yaw=False)[source]¶
Convert boxes from src mode to dst mode.
- Parameters
(tuple | list | np.ndarray | (box) – torch.Tensor |
BaseInstance3DBoxes
): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.src (
Box3DMode
) – The src Box mode.dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
with_yaw (bool, optional) – If box is an instance of
BaseInstance3DBoxes
, whether or not it has a yaw angle. Defaults to True.correct_yaw (bool) – If the yaw is rotated by rt_mat.
- Returns
- (tuple | list | np.ndarray | torch.Tensor |
BaseInstance3DBoxes
): The converted box of the same type.
- class mmdet3d.structures.bbox_3d.CameraInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 1.0, 0.5))[source]¶
3D boxes of instances in CAM coordinates.
Coordinates in camera:
z front (yaw=-0.5*pi) / / 0 ------> x right (yaw=0) | | v down y
The relative coordinate of bottom center in a CAM box is (0.5, 1.0, 0.5), and the yaw is around the y axis, thus the rotation axis=1. The yaw is 0 at the positive direction of x axis, and decreases from the positive direction of x to the positive direction of z.
- tensor¶
Float matrix in shape (N, box_dim).
- Type
torch.Tensor
- box_dim¶
Integer indicating the dimension of a box Each row is (x, y, z, x_size, y_size, z_size, yaw, …).
- Type
int
- with_yaw¶
If True, the value of yaw will be set to 0 as axis-aligned boxes tightly enclosing the original boxes.
- Type
bool
- property bev¶
2D BEV box of each box with rotation in XYWHR format, in shape (N, 5).
- Type
torch.Tensor
- property bottom_height¶
torch.Tensor: A vector with bottom’s height of each box in shape (N, ).
- convert_to(dst, rt_mat=None, correct_yaw=False)[source]¶
Convert self to
dst
mode.- Parameters
dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from
src
coordinates todst
coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.correct_yaw (bool) – Whether to convert the yaw angle to the target coordinate. Defaults to False.
- Returns
The converted box of the same type in the
dst
mode.- Return type
- property corners¶
- Coordinates of corners of all the boxes in
shape (N, 8, 3).
Convert the boxes to in clockwise order, in the form of (x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)
front z / / (x0, y0, z1) + ----------- + (x1, y0, z1) /| / | / | / | (x0, y0, z0) + ----------- + + (x1, y1, z1) | / . | / | / origin | / (x0, y1, z0) + ----------- + -------> x right | (x1, y1, z0) | v down y
- Type
torch.Tensor
- flip(bev_direction='horizontal', points=None)[source]¶
Flip the boxes in BEV along given BEV direction.
In CAM coordinates, it flips the x (horizontal) or z (vertical) axis.
- Parameters
bev_direction (str) – Flip direction (horizontal or vertical).
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to flip. Defaults to None.
- Returns
Flipped points.
- Return type
torch.Tensor, numpy.ndarray or None
- property gravity_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- property height¶
A vector with height of each box in shape (N, ).
- Type
torch.Tensor
- classmethod height_overlaps(boxes1, boxes2, mode='iou')[source]¶
Calculate height overlaps of two boxes.
This function calculates the height overlaps between
boxes1
andboxes2
, whereboxes1
andboxes2
should be in the same type.- Parameters
boxes1 (
CameraInstance3DBoxes
) – Boxes 1 contain N boxes.boxes2 (
CameraInstance3DBoxes
) – Boxes 2 contain M boxes.mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.
- Returns
Calculated iou of boxes’ heights.
- Return type
torch.Tensor
- property local_yaw¶
torch.Tensor: A vector with local yaw of each box in shape (N, ). local_yaw equals to alpha in kitti, which is commonly used in monocular 3D object detection task, so only
CameraInstance3DBoxes
has the property.
- points_in_boxes_all(points, boxes_override=None)[source]¶
Find all boxes in which each point is.
- Parameters
- Returns
- The index of all boxes in which each point is,
in shape (B, M, T).
- Return type
torch.Tensor
- points_in_boxes_part(points, boxes_override=None)[source]¶
Find the box in which each point is.
- Parameters
- Returns
- The index of the box in which
each point is, in shape (M, ). Default value is -1 (if the point is not enclosed by any box).
- Return type
torch.Tensor
- rotate(angle, points=None)[source]¶
Rotate boxes with points (optional) with the given angle or rotation matrix.
- Parameters
angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to rotate. Defaults to None.
- Returns
- When
points
is None, the function returns None, otherwise it returns the rotated points and the rotation matrix
rot_mat_T
.
- When
- Return type
tuple or None
- property top_height¶
torch.Tensor: A vector with the top height of each box in shape (N, ).
- class mmdet3d.structures.bbox_3d.Coord3DMode(value)[source]¶
- Enum of different ways to represent a box
and point cloud.
Coordinates in LiDAR:
up z ^ x front | / | / left y <------ 0
The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.
Coordinates in camera:
z front / / 0 ------> x right | | v down y
The relative coordinate of bottom center in a CAM box is (0.5, 1.0, 0.5), and the yaw is around the y axis, thus the rotation axis=1.
Coordinates in Depth mode:
up z ^ y front | / | / 0 ------> x right
The relative coordinate of bottom center in a DEPTH box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.
- static convert(input, src, dst, rt_mat=None, with_yaw=True, is_point=True)[source]¶
Convert boxes or points from src mode to dst mode.
- Parameters
(tuple | list | np.ndarray | torch.Tensor | (input) –
BaseInstance3DBoxes
|BasePoints
): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.src (
Box3DMode
|Coord3DMode
) – The source mode.dst (
Box3DMode
|Coord3DMode
) – The target mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
with_yaw (bool) – If box is an instance of
BaseInstance3DBoxes
, whether or not it has a yaw angle. Defaults to True.is_point (bool) – If input is neither an instance of
BaseInstance3DBoxes
nor an instance ofBasePoints
, whether or not it is point data. Defaults to True.
- Returns
- (tuple | list | np.ndarray | torch.Tensor |
BaseInstance3DBoxes
|BasePoints
): The converted box of the same type.
- static convert_box(box, src, dst, rt_mat=None, with_yaw=True)[source]¶
Convert boxes from src mode to dst mode.
- Parameters
(tuple | list | np.ndarray | (box) – torch.Tensor |
BaseInstance3DBoxes
): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.src (
Box3DMode
) – The src Box mode.dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
with_yaw (bool) – If box is an instance of
BaseInstance3DBoxes
, whether or not it has a yaw angle. Defaults to True.
- Returns
- (tuple | list | np.ndarray | torch.Tensor |
BaseInstance3DBoxes
): The converted box of the same type.
- static convert_point(point, src, dst, rt_mat=None)[source]¶
Convert points from src mode to dst mode.
- Parameters
(tuple | list | np.ndarray | (point) – torch.Tensor |
BasePoints
): Can be a k-tuple, k-list or an Nxk array/tensor.src (
CoordMode
) – The src Point mode.dst (
CoordMode
) – The target Point mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
The converted point of the same type.
- Return type
(tuple | list | np.ndarray | torch.Tensor |
BasePoints
)
- class mmdet3d.structures.bbox_3d.DepthInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]¶
3D boxes of instances in Depth coordinates.
Coordinates in Depth:
up z y front (yaw=0.5*pi) ^ ^ | / | / 0 ------> x right (yaw=0)
The relative coordinate of bottom center in a Depth box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2. The yaw is 0 at the positive direction of x axis, and decreases from the positive direction of x to the positive direction of y. Also note that rotation of DepthInstance3DBoxes is counterclockwise, which is reverse to the definition of the yaw angle (clockwise).
A refactor is ongoing to make the three coordinate systems easier to understand and convert between each other.
- tensor¶
Float matrix of N x box_dim.
- Type
torch.Tensor
- box_dim¶
Integer indicates the dimension of a box Each row is (x, y, z, x_size, y_size, z_size, yaw, …).
- Type
int
- with_yaw¶
If True, the value of yaw will be set to 0 as minmax boxes.
- Type
bool
- convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from
src
coordinates todst
coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
The converted box of the same type in the
dst
mode.- Return type
- property corners¶
Coordinates of corners of all the boxes in shape (N, 8, 3).
Convert the boxes to corners in clockwise order, in form of
(x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)
up z front y ^ / | / | (x0, y1, z1) + ----------- + (x1, y1, z1) /| / | / | / | (x0, y0, z1) + ----------- + + (x1, y1, z0) | / . | / | / origin | / (x0, y0, z0) + ----------- + --------> right x (x1, y0, z0)
- Type
torch.Tensor
- enlarged_box(extra_width)[source]¶
Enlarge the length, width and height boxes.
- Parameters
extra_width (float | torch.Tensor) – Extra width to enlarge the box.
- Returns
Enlarged boxes.
- Return type
- flip(bev_direction='horizontal', points=None)[source]¶
Flip the boxes in BEV along given BEV direction.
In Depth coordinates, it flips x (horizontal) or y (vertical) axis.
- Parameters
bev_direction (str, optional) – Flip direction (horizontal or vertical). Defaults to ‘horizontal’.
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to flip. Defaults to None.
- Returns
Flipped points.
- Return type
torch.Tensor, numpy.ndarray or None
- get_surface_line_center()[source]¶
Compute surface and line center of bounding boxes.
- Returns
Surface and line center of bounding boxes.
- Return type
torch.Tensor
- property gravity_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- rotate(angle, points=None)[source]¶
Rotate boxes with points (optional) with the given angle or rotation matrix.
- Parameters
angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to rotate. Defaults to None.
- Returns
- When
points
is None, the function returns None, otherwise it returns the rotated points and the rotation matrix
rot_mat_T
.
- When
- Return type
tuple or None
- class mmdet3d.structures.bbox_3d.LiDARInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]¶
3D boxes of instances in LIDAR coordinates.
Coordinates in LiDAR:
up z x front (yaw=0) ^ ^ | / | / (yaw=0.5*pi) left y <------ 0
The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2. The yaw is 0 at the positive direction of x axis, and increases from the positive direction of x to the positive direction of y.
A refactor is ongoing to make the three coordinate systems easier to understand and convert between each other.
- tensor¶
Float matrix of N x box_dim.
- Type
torch.Tensor
- box_dim¶
Integer indicating the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw, …).
- Type
int
- with_yaw¶
If True, the value of yaw will be set to 0 as minmax boxes.
- Type
bool
- convert_to(dst, rt_mat=None, correct_yaw=False)[source]¶
Convert self to
dst
mode.- Parameters
dst (
Box3DMode
) – the target Box modert_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from
src
coordinates todst
coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.correct_yaw (bool) – If convert the yaw angle to the target coordinate. Defaults to False.
- Returns
The converted box of the same type in the
dst
mode.- Return type
- property corners¶
Coordinates of corners of all the boxes in shape (N, 8, 3).
Convert the boxes to corners in clockwise order, in form of
(x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)
up z front x ^ / | / | (x1, y0, z1) + ----------- + (x1, y1, z1) /| / | / | / | (x0, y0, z1) + ----------- + + (x1, y1, z0) | / . | / | / origin | / left y<-------- + ----------- + (x0, y1, z0) (x0, y0, z0)
- Type
torch.Tensor
- enlarged_box(extra_width)[source]¶
Enlarge the length, width and height boxes.
- Parameters
extra_width (float | torch.Tensor) – Extra width to enlarge the box.
- Returns
Enlarged boxes.
- Return type
- flip(bev_direction='horizontal', points=None)[source]¶
Flip the boxes in BEV along given BEV direction.
In LIDAR coordinates, it flips the y (horizontal) or x (vertical) axis.
- Parameters
bev_direction (str) – Flip direction (horizontal or vertical).
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to flip. Defaults to None.
- Returns
Flipped points.
- Return type
torch.Tensor, numpy.ndarray or None
- property gravity_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- rotate(angle, points=None)[source]¶
Rotate boxes with points (optional) with the given angle or rotation matrix.
- Parameters
angles (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to rotate. Defaults to None.
- Returns
- When
points
is None, the function returns None, otherwise it returns the rotated points and the rotation matrix
rot_mat_T
.
- When
- Return type
tuple or None
- mmdet3d.structures.bbox_3d.get_box_type(box_type)[source]¶
Get the type and mode of box structure.
- Parameters
box_type (str) – The type of box structure. The valid value are “LiDAR”, “Camera”, or “Depth”.
- Raises
ValueError – A ValueError is raised when box_type does not belong to the three valid types.
- Returns
Box type and box mode.
- Return type
tuple
- mmdet3d.structures.bbox_3d.get_proj_mat_by_coord_type(img_meta, coord_type)[source]¶
Obtain image features using points.
- Parameters
img_meta (dict) – Meta info.
coord_type (str) – ‘DEPTH’ or ‘CAMERA’ or ‘LIDAR’. Can be case-insensitive.
- Returns
transformation matrix.
- Return type
torch.Tensor
- mmdet3d.structures.bbox_3d.limit_period(val, offset=0.5, period=3.141592653589793)[source]¶
Limit the value into a period for periodic function.
- Parameters
val (torch.Tensor | np.ndarray) – The value to be converted.
offset (float, optional) – Offset to set the value range. Defaults to 0.5.
period ([type], optional) – Period of the value. Defaults to np.pi.
- Returns
- Value in the range of
[-offset * period, (1-offset) * period]
- Return type
(torch.Tensor | np.ndarray)
- mmdet3d.structures.bbox_3d.mono_cam_box2vis(cam_box)[source]¶
This is a post-processing function on the bboxes from Mono-3D task. If we want to perform projection visualization, we need to:
rotate the box along x-axis for np.pi / 2 (roll)
change orientation from local yaw to global yaw
convert yaw by (np.pi / 2 - yaw)
After applying this function, we can project and draw it on 2D images.
- Parameters
cam_box (
CameraInstance3DBoxes
) – 3D bbox in camera coordinate system before conversion. Could be gt bbox loaded from dataset or network prediction output.- Returns
Box after conversion.
- Return type
- mmdet3d.structures.bbox_3d.points_cam2img(points_3d, proj_mat, with_depth=False)[source]¶
Project points in camera coordinates to image coordinates.
- Parameters
points_3d (torch.Tensor | np.ndarray) – Points in shape (N, 3)
proj_mat (torch.Tensor | np.ndarray) – Transformation matrix between coordinates.
with_depth (bool, optional) – Whether to keep depth in the output. Defaults to False.
- Returns
- Points in image coordinates,
with shape [N, 2] if with_depth=False, else [N, 3].
- Return type
(torch.Tensor | np.ndarray)
- mmdet3d.structures.bbox_3d.points_img2cam(points, cam2img)[source]¶
Project points in image coordinates to camera coordinates.
- Parameters
points (torch.Tensor) – 2.5D points in 2D images, [N, 3], 3 corresponds with x, y in the image and depth.
cam2img (torch.Tensor) – Camera intrinsic matrix. The shape can be [3, 3], [3, 4] or [4, 4].
- Returns
- points in 3D space. [N, 3],
3 corresponds with x, y, z in 3D space.
- Return type
torch.Tensor
- mmdet3d.structures.bbox_3d.rotation_3d_in_axis(points, angles, axis=0, return_mat=False, clockwise=False)[source]¶
Rotate points by angles according to axis.
- Parameters
points (np.ndarray | torch.Tensor | list | tuple) – Points of shape (N, M, 3).
angles (np.ndarray | torch.Tensor | list | tuple | float) – Vector of angles in shape (N,)
axis (int, optional) – The axis to be rotated. Defaults to 0.
return_mat – Whether or not return the rotation matrix (transposed). Defaults to False.
clockwise – Whether the rotation is clockwise. Defaults to False.
- Raises
ValueError – when the axis is not in range [0, 1, 2], it will raise value error.
- Returns
Rotated points in shape (N, M, 3).
- Return type
(torch.Tensor | np.ndarray)
ops¶
points¶
- class mmdet3d.structures.points.BasePoints(tensor, points_dim=3, attribute_dims=None)[source]¶
Base class for Points.
- Parameters
tensor (torch.Tensor | np.ndarray | list) – a N x points_dim matrix.
points_dim (int, optional) – Number of the dimension of a point. Each row is (x, y, z). Defaults to 3.
attribute_dims (dict, optional) – Dictionary to indicate the meaning of extra dimension. Defaults to None.
- tensor¶
Float matrix of N x points_dim.
- Type
torch.Tensor
- points_dim¶
Integer indicating the dimension of a point. Each row is (x, y, z, …).
- Type
int
- attribute_dims¶
Dictionary to indicate the meaning of extra dimension. Defaults to None.
- Type
bool
- rotation_axis¶
Default rotation axis for points rotation.
- Type
int
- property bev¶
BEV of the points in shape (N, 2).
- Type
torch.Tensor
- classmethod cat(points_list)[source]¶
Concatenate a list of Points into a single Points.
- Parameters
points_list (list[
BasePoints
]) – List of points.- Returns
The concatenated Points.
- Return type
- property color¶
torch.Tensor: A vector with color of each point in shape (N, 3), or None.
- abstract convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
CoordMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
- The converted box of the same type
in the dst mode.
- Return type
- property coord¶
Coordinates of each point in shape (N, 3).
- Type
torch.Tensor
- property device¶
The device of the points are on.
- Type
str
- abstract flip(bev_direction='horizontal')[source]¶
Flip the points along given BEV direction.
- Parameters
bev_direction (str) – Flip direction (horizontal or vertical).
- property height¶
torch.Tensor: A vector with height of each point in shape (N, 1), or None.
- in_range_3d(point_range)[source]¶
Check whether the points are in the given range.
- Parameters
point_range (list | torch.Tensor) – The range of point (x_min, y_min, z_min, x_max, y_max, z_max)
Note
In the original implementation of SECOND, checking whether a box in the range checks whether the points are in a convex polygon, we try to reduce the burden for simpler cases.
- Returns
- A binary vector indicating whether each point is
inside the reference range.
- Return type
torch.Tensor
- in_range_bev(point_range)[source]¶
Check whether the points are in the given range.
- Parameters
point_range (list | torch.Tensor) – The range of point in order of (x_min, y_min, x_max, y_max).
- Returns
- Indicating whether each point is inside
the reference range.
- Return type
torch.Tensor
- new_point(data)[source]¶
Create a new point object with data.
- The new point and its tensor has the similar properties
as self and self.tensor, respectively.
- Parameters
data (torch.Tensor | numpy.array | list) – Data to be copied.
- Returns
- A new point object with
data
, the object’s other properties are similar to
self
.
- A new point object with
- Return type
- rotate(rotation, axis=None)[source]¶
Rotate points with the given rotation matrix or angle.
- Parameters
rotation (float | np.ndarray | torch.Tensor) – Rotation matrix or angle.
axis (int, optional) – Axis to rotate at. Defaults to None.
- scale(scale_factor)[source]¶
Scale the points with horizontal and vertical scaling factors.
- Parameters
scale_factors (float) – Scale factors to scale the points.
- property shape¶
Shape of points.
- Type
torch.Shape
- class mmdet3d.structures.points.CameraPoints(tensor, points_dim=3, attribute_dims=None)[source]¶
Points of instances in CAM coordinates.
- Parameters
tensor (torch.Tensor | np.ndarray | list) – a N x points_dim matrix.
points_dim (int, optional) – Number of the dimension of a point. Each row is (x, y, z). Defaults to 3.
attribute_dims (dict, optional) – Dictionary to indicate the meaning of extra dimension. Defaults to None.
- tensor¶
Float matrix of N x points_dim.
- Type
torch.Tensor
- points_dim¶
Integer indicating the dimension of a point. Each row is (x, y, z, …).
- Type
int
- attribute_dims¶
Dictionary to indicate the meaning of extra dimension. Defaults to None.
- Type
bool
- rotation_axis¶
Default rotation axis for points rotation.
- Type
int
- property bev¶
BEV of the points in shape (N, 2).
- Type
torch.Tensor
- convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
CoordMode
) – The target Point mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
- The converted point of the same type
in the dst mode.
- Return type
- class mmdet3d.structures.points.DepthPoints(tensor, points_dim=3, attribute_dims=None)[source]¶
Points of instances in DEPTH coordinates.
- Parameters
tensor (torch.Tensor | np.ndarray | list) – a N x points_dim matrix.
points_dim (int, optional) – Number of the dimension of a point. Each row is (x, y, z). Defaults to 3.
attribute_dims (dict, optional) – Dictionary to indicate the meaning of extra dimension. Defaults to None.
- tensor¶
Float matrix of N x points_dim.
- Type
torch.Tensor
- points_dim¶
Integer indicating the dimension of a point. Each row is (x, y, z, …).
- Type
int
- attribute_dims¶
Dictionary to indicate the meaning of extra dimension. Defaults to None.
- Type
bool
- rotation_axis¶
Default rotation axis for points rotation.
- Type
int
- convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
CoordMode
) – The target Point mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
- The converted point of the same type
in the dst mode.
- Return type
- class mmdet3d.structures.points.LiDARPoints(tensor, points_dim=3, attribute_dims=None)[source]¶
Points of instances in LIDAR coordinates.
- Parameters
tensor (torch.Tensor | np.ndarray | list) – a N x points_dim matrix.
points_dim (int, optional) – Number of the dimension of a point. Each row is (x, y, z). Defaults to 3.
attribute_dims (dict, optional) – Dictionary to indicate the meaning of extra dimension. Defaults to None.
- tensor¶
Float matrix of N x points_dim.
- Type
torch.Tensor
- points_dim¶
Integer indicating the dimension of a point. Each row is (x, y, z, …).
- Type
int
- attribute_dims¶
Dictionary to indicate the meaning of extra dimension. Defaults to None.
- Type
bool
- rotation_axis¶
Default rotation axis for points rotation.
- Type
int
- convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
CoordMode
) – The target Point mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
- The converted point of the same type
in the dst mode.
- Return type
mmdet3d.testing¶
mmdet3d.visualization¶
mmdet3d.utils¶
- class mmdet3d.utils.ArrayConverter(template_array: Optional[Union[tuple, list, int, float, numpy.ndarray, torch.Tensor]] = None)[source]¶
Utility class for data-type agnostic processing.
- Parameters
(tuple | list | int | float | np.ndarray | (template_array) – torch.Tensor, optional): template array. Defaults to None.
- convert(input_array: Union[tuple, list, int, float, numpy.ndarray, torch.Tensor], target_type: Optional[type] = None, target_array: Optional[Union[numpy.ndarray, torch.Tensor]] = None) → Union[numpy.ndarray, torch.Tensor][source]¶
Convert input array to target data type.
- Parameters
(tuple | list | int | float | np.ndarray | (input_array) – torch.Tensor): Input array.
- :param target_type (
np.ndarray
ortorch.Tensor
: optional): Type to which input array is converted. Defaults to None.
- :paramoptional): Type to which input array is converted.
Defaults to None.
- Parameters
target_array (np.ndarray | torch.Tensor, optional) – Template array to which input array is converted. Defaults to None.
- Raises
ValueError – If input is list or tuple and cannot be converted to to a NumPy array, a ValueError is raised.
TypeError – If input type does not belong to the above range, or the contents of a list or tuple do not share the same data type, a TypeError is raised.
- Returns
The converted array.
- Return type
np.ndarray or torch.Tensor
- recover(input_array: Union[numpy.ndarray, torch.Tensor]) → Union[numpy.ndarray, torch.Tensor][source]¶
Recover input type to original array type.
- Parameters
input_array (np.ndarray | torch.Tensor) – Input array.
- Returns
Converted array.
- Return type
np.ndarray or torch.Tensor
- set_template(array: Union[tuple, list, int, float, numpy.ndarray, torch.Tensor]) → None[source]¶
Set template array.
- Parameters
array (tuple | list | int | float | np.ndarray | torch.Tensor) – Template array.
- Raises
ValueError – If input is list or tuple and cannot be converted to to a NumPy array, a ValueError is raised.
TypeError – If input type does not belong to the above range, or the contents of a list or tuple do not share the same data type, a TypeError is raised.
- mmdet3d.utils.array_converter(to_torch: bool = True, apply_to: Tuple[str, ...] = (), template_arg_name_: Optional[str] = None, recover: bool = True) → Callable[source]¶
Wrapper function for data-type agnostic processing.
First converts input arrays to PyTorch tensors or NumPy ndarrays for middle calculation, then convert output to original data-type if recover=True.
- Parameters
to_torch (bool) – Whether convert to PyTorch tensors for middle calculation. Defaults to True.
apply_to (Tuple[str, ...]) – The arguments to which we apply data-type conversion. Defaults to an empty tuple.
template_arg_name (str, optional) – Argument serving as the template ( return arrays should have the same dtype and device as the template). Defaults to None. If None, we will use the first argument in apply_to as the template argument.
recover (bool) – Whether or not recover the wrapped function outputs to the template_arg_name_ type. Defaults to True.
- Raises
ValueError – When template_arg_name_ is not among all args, or when apply_to contains an arg which is not among all args, a ValueError will be raised. When the template argument or an argument to convert is a list or tuple, and cannot be converted to a NumPy array, a ValueError will be raised.
TypeError – When the type of the template argument or an argument to convert does not belong to the above range, or the contents of such an list-or-tuple-type argument do not share the same data type, a TypeError is raised.
- Returns
wrapped function.
- Return type
(function)
Example
>>> import torch >>> import numpy as np >>> >>> # Use torch addition for a + b, >>> # and convert return values to the type of a >>> @array_converter(apply_to=('a', 'b')) >>> def simple_add(a, b): >>> return a + b >>> >>> a = np.array([1.1]) >>> b = np.array([2.2]) >>> simple_add(a, b) >>> >>> # Use numpy addition for a + b, >>> # and convert return values to the type of b >>> @array_converter(to_torch=False, apply_to=('a', 'b'), >>> template_arg_name_='b') >>> def simple_add(a, b): >>> return a + b >>> >>> simple_add() >>> >>> # Use torch funcs for floor(a) if flag=True else ceil(a), >>> # and return the torch tensor >>> @array_converter(apply_to=('a',), recover=False) >>> def floor_or_ceil(a, flag=True): >>> return torch.floor(a) if flag else torch.ceil(a) >>> >>> floor_or_ceil(a, flag=False)
- mmdet3d.utils.compat_cfg(cfg)[source]¶
This function would modify some filed to keep the compatibility of config.
For example, it will move some args which will be deprecated to the correct fields.
- mmdet3d.utils.register_all_modules(init_default_scope: bool = True) → None[source]¶
Register all modules in mmdet3d into the registries.
- Parameters
init_default_scope (bool) – Whether initialize the mmdet default scope. When init_default_scope=True, the global default scope will be set to mmdet3d, and all registries will build modules from mmdet3d’s registry node. To understand more about the registry, please refer to https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/registry.md Defaults to True.
Model Zoo¶
Common settings¶
We use distributed training.
For fair comparison with other codebases, we report the GPU memory as the maximum value of
torch.cuda.max_memory_allocated()
for all 8 GPUs. Note that this value is usually less than whatnvidia-smi
shows.We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script benchmark.py which computes the average time on 2000 images.
Baselines¶
SECOND¶
Please refer to SECOND for details. We provide SECOND baselines on KITTI and Waymo datasets.
PointPillars¶
Please refer to PointPillars for details. We provide pointpillars baselines on KITTI, nuScenes, Lyft, and Waymo datasets.
VoteNet¶
Please refer to VoteNet for details. We provide VoteNet baselines on ScanNet and SUNRGBD datasets.
Dynamic Voxelization¶
Please refer to Dynamic Voxelization for details.
RegNetX¶
Please refer to RegNet for details. We provide pointpillars baselines with RegNetX backbones on nuScenes and Lyft datasets currently.
nuImages¶
We also support baseline models on nuImages dataset. Please refer to nuImages for details. We report Mask R-CNN, Cascade Mask R-CNN and HTC results currently.
CenterPoint¶
Please refer to CenterPoint for details.
SSN¶
Please refer to SSN for details. We provide pointpillars with shape-aware grouping heads used in SSN on the nuScenes and Lyft datasets currently.
ImVoteNet¶
Please refer to ImVoteNet for details. We provide ImVoteNet baselines on SUNRGBD dataset.
PointNet++¶
Please refer to PointNet++ for details. We provide PointNet++ baselines on ScanNet and S3DIS datasets.
Group-Free-3D¶
Please refer to Group-Free-3D for details. We provide Group-Free-3D baselines on ScanNet dataset.
ImVoxelNet¶
Please refer to ImVoxelNet for details. We provide ImVoxelNet baselines on KITTI dataset.
FCAF3D¶
Please refer to FCAF3D for details. We provide FCAF3D baselines on the ScanNet, S3DIS, and SUN RGB-D datasets.
Mixed Precision (FP16) Training¶
Please refer to Mixed Precision (FP16) Training on PointPillars for details.
Benchmarks¶
Here we benchmark the training and testing speed of models in MMDetection3D, with some other open source 3D detection codebases.
Settings¶
Hardwares: 8 NVIDIA Tesla V100 (32G) GPUs, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Software: Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.3, numba 0.48.0.
Model: Since all the other codebases implements different models, we compare the corresponding models including SECOND, PointPillars, Part-A2, and VoteNet with them separately.
Metrics: We use the average throughput in iterations of the entire training run and skip the first 50 iterations of each epoch to skip GPU warmup time.
Main Results¶
We compare the training speed (samples/s) with other codebases if they implement the similar models. The results are as below, the greater the numbers in the table, the faster of the training process. The models that are not supported by other codebases are marked by ×
.
Methods | MMDetection3D | OpenPCDet | votenet | Det3D |
---|---|---|---|---|
VoteNet | 358 | × | 77 | × |
PointPillars-car | 141 | × | × | 140 |
PointPillars-3class | 107 | 44 | × | × |
SECOND | 40 | 30 | × | × |
Part-A2 | 17 | 14 | × | × |
Details of Comparison¶
Modification for Calculating Speed¶
MMDetection3D: We try to use as similar settings as those of other codebases as possible using benchmark configs.
Det3D: For comparison with Det3D, we use the commit 519251e.
OpenPCDet: For comparison with OpenPCDet, we use the commit b32fbddb.
For training speed, we add code to record the running time in the file
./tools/train_utils/train_utils.py
. We calculate the speed of each epoch, and report the average speed of all the epochs.(diff to make it use the same method for benchmarking speed - click to expand)
diff --git a/tools/train_utils/train_utils.py b/tools/train_utils/train_utils.py index 91f21dd..021359d 100644 --- a/tools/train_utils/train_utils.py +++ b/tools/train_utils/train_utils.py @@ -2,6 +2,7 @@ import torch import os import glob import tqdm +import datetime from torch.nn.utils import clip_grad_norm_ @@ -13,7 +14,10 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac if rank == 0: pbar = tqdm.tqdm(total=total_it_each_epoch, leave=leave_pbar, desc='train', dynamic_ncols=True) + start_time = None for cur_it in range(total_it_each_epoch): + if cur_it > 49 and start_time is None: + start_time = datetime.datetime.now() try: batch = next(dataloader_iter) except StopIteration: @@ -55,9 +59,11 @@ def train_one_epoch(model, optimizer, train_loader, model_func, lr_scheduler, ac tb_log.add_scalar('learning_rate', cur_lr, accumulated_iter) for key, val in tb_dict.items(): tb_log.add_scalar('train_' + key, val, accumulated_iter) + endtime = datetime.datetime.now() + speed = (endtime - start_time).seconds / (total_it_each_epoch - 50) if rank == 0: pbar.close() - return accumulated_iter + return accumulated_iter, speed def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_cfg, @@ -65,6 +71,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_ lr_warmup_scheduler=None, ckpt_save_interval=1, max_ckpt_save_num=50, merge_all_iters_to_one_epoch=False): accumulated_iter = start_iter + speeds = [] with tqdm.trange(start_epoch, total_epochs, desc='epochs', dynamic_ncols=True, leave=(rank == 0)) as tbar: total_it_each_epoch = len(train_loader) if merge_all_iters_to_one_epoch: @@ -82,7 +89,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_ cur_scheduler = lr_warmup_scheduler else: cur_scheduler = lr_scheduler - accumulated_iter = train_one_epoch( + accumulated_iter, speed = train_one_epoch( model, optimizer, train_loader, model_func, lr_scheduler=cur_scheduler, accumulated_iter=accumulated_iter, optim_cfg=optim_cfg, @@ -91,7 +98,7 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_ total_it_each_epoch=total_it_each_epoch, dataloader_iter=dataloader_iter ) - + speeds.append(speed) # save trained model trained_epoch = cur_epoch + 1 if trained_epoch % ckpt_save_interval == 0 and rank == 0: @@ -107,6 +114,8 @@ def train_model(model, optimizer, train_loader, model_func, lr_scheduler, optim_ save_checkpoint( checkpoint_state(model, optimizer, trained_epoch, accumulated_iter), filename=ckpt_name, ) + print(speed) + print(f'*******{sum(speeds) / len(speeds)}******') def model_state_to_cpu(model_state):
VoteNet¶
MMDetection3D: With release v0.1.0, run
./tools/dist_train.sh configs/votenet/votenet_8xb16_sunrgbd-3d.py 8 --no-validate
votenet: At commit 2f6d6d3, run
python train.py --dataset sunrgbd --batch_size 16
Then benchmark the test speed by running
python eval.py --dataset sunrgbd --checkpoint_path log_sunrgbd/checkpoint.tar --batch_size 1 --dump_dir eval_sunrgbd --cluster_sampling seed_fps --use_3d_nms --use_cls_nms --per_class_proposal
Note that eval.py is modified to compute inference time.
(diff to benchmark the similar models - click to expand)
diff --git a/eval.py b/eval.py index c0b2886..04921e9 100644 --- a/eval.py +++ b/eval.py @@ -10,6 +10,7 @@ import os import sys import numpy as np from datetime import datetime +import time import argparse import importlib import torch @@ -28,7 +29,7 @@ parser.add_argument('--checkpoint_path', default=None, help='Model checkpoint pa parser.add_argument('--dump_dir', default=None, help='Dump dir to save sample outputs [default: None]') parser.add_argument('--num_point', type=int, default=20000, help='Point Number [default: 20000]') parser.add_argument('--num_target', type=int, default=256, help='Point Number [default: 256]') -parser.add_argument('--batch_size', type=int, default=8, help='Batch Size during training [default: 8]') +parser.add_argument('--batch_size', type=int, default=1, help='Batch Size during training [default: 8]') parser.add_argument('--vote_factor', type=int, default=1, help='Number of votes generated from each seed [default: 1]') parser.add_argument('--cluster_sampling', default='vote_fps', help='Sampling strategy for vote clusters: vote_fps, seed_fps, random [default: vote_fps]') parser.add_argument('--ap_iou_thresholds', default='0.25,0.5', help='A list of AP IoU thresholds [default: 0.25,0.5]') @@ -132,6 +133,7 @@ CONFIG_DICT = {'remove_empty_box': (not FLAGS.faster_eval), 'use_3d_nms': FLAGS. # ------------------------------------------------------------------------- GLOBAL CONFIG END def evaluate_one_epoch(): + time_list = list() stat_dict = {} ap_calculator_list = [APCalculator(iou_thresh, DATASET_CONFIG.class2type) \ for iou_thresh in AP_IOU_THRESHOLDS] @@ -144,6 +146,8 @@ def evaluate_one_epoch(): # Forward pass inputs = {'point_clouds': batch_data_label['point_clouds']} + torch.cuda.synchronize() + start_time = time.perf_counter() with torch.no_grad(): end_points = net(inputs) @@ -161,6 +165,12 @@ def evaluate_one_epoch(): batch_pred_map_cls = parse_predictions(end_points, CONFIG_DICT) batch_gt_map_cls = parse_groundtruths(end_points, CONFIG_DICT) + torch.cuda.synchronize() + elapsed = time.perf_counter() - start_time + time_list.append(elapsed) + + if len(time_list==200): + print("average inference time: %4f"%(sum(time_list[5:])/len(time_list[5:]))) for ap_calculator in ap_calculator_list: ap_calculator.step(batch_pred_map_cls, batch_gt_map_cls)
PointPillars-car¶
MMDetection3D: With release v0.1.0, run
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_3x8_100e_det3d_kitti-3d-car.py 8 --no-validate
Det3D: At commit 519251e, use
kitti_point_pillars_mghead_syncbn.py
and run./tools/scripts/train.sh --launcher=slurm --gpus=8
Note that the config in train.sh is modified to train point pillars.
(diff to benchmark the similar models - click to expand)
diff --git a/tools/scripts/train.sh b/tools/scripts/train.sh index 3a93f95..461e0ea 100755 --- a/tools/scripts/train.sh +++ b/tools/scripts/train.sh @@ -16,9 +16,9 @@ then fi # Voxelnet -python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR +# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ kitti_car_vfev3_spmiddlefhd_rpn1_mghead_syncbn.py --work_dir=$SECOND_WORK_DIR # python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/cbgs/configs/ nusc_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$NUSC_CBGS_WORK_DIR # python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py examples/second/configs/ lyft_all_vfev3_spmiddleresnetfhd_rpn2_mghead_syncbn.py --work_dir=$LYFT_CBGS_WORK_DIR # PointPillars -# python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ original_pp_mghead_syncbn_kitti.py --work_dir=$PP_WORK_DIR +python -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./examples/point_pillars/configs/ kitti_point_pillars_mghead_syncbn.py
PointPillars-3class¶
MMDetection3D: With release v0.1.0, run
./tools/dist_train.sh configs/benchmark/hv_pointpillars_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
OpenPCDet: At commit b32fbddb, run
cd tools sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/pointpillar.yaml --batch_size 32 --workers 32 --epochs 80
SECOND¶
For SECOND, we mean the SECONDv1.5 that was first implemented in second.Pytorch. Det3D’s implementation of SECOND uses its self-implemented Multi-Group Head, so its speed is not compatible with other codebases.
MMDetection3D: With release v0.1.0, run
./tools/dist_train.sh configs/benchmark/hv_second_secfpn_4x8_80e_pcdet_kitti-3d-3class.py 8 --no-validate
OpenPCDet: At commit b32fbddb, run
cd tools sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/second.yaml --batch_size 32 --workers 32 --epochs 80
Part-A2¶
MMDetection3D: With release v0.1.0, run
./tools/dist_train.sh configs/benchmark/hv_PartA2_secfpn_4x8_cyclic_80e_pcdet_kitti-3d-3class.py 8 --no-validate
OpenPCDet: At commit b32fbddb, train the model by running
cd tools sh ./scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} 8 --cfg_file ./cfgs/kitti_models/PartA2.yaml --batch_size 32 --workers 32 --epochs 80
Changelog of v1.0.x¶
v1.0.0 (6/4/2023)¶
Improvements¶
Add BN in FPN to avoid loss Nan in MVXNet (#2282)
Update
s3dis_data_utils.py
(#2232)
Bug Fixes¶
Fix precision error when using mixed precision on CenterPoint (#2341)
Replace
np.transpose
withtorch.permute
to speed up (@2273)Update links of SECOND checkpoints (#2185)
Contributors¶
A total of 7 developers contributed to this release. @JingweiZhang12, @ZCMax, @Xiangxu-0103, @vansinhu, @cs1488, @sunjiahao1999, @Ginray
v1.0.0rc7 (7/1/2023)¶
Improvements¶
Support training and testing on MLU (#2167)
Contributors¶
A total of 1 developers contributed to this release. @mengpenghui
v1.0.0rc6 (2/12/2022)¶
New Features¶
Add
Projects/
folder and the first example project (#2082)
Improvements¶
Update Waymo converter to save storage space (#1759)
Update model link and performance of CenterPoint (#1916)
Bug Fixes¶
Fix GPU memory occupancy problem in PointRCNN (#1928)
Fix sampling bug in
IoUNegPiecewiseSampler
(#2018)
Contributors¶
A total of 6 developers contributed to this release.
@oyel, @zzj403, @VVsssssk, @Tai-Wang, @tpoisonooo, @JingweiZhang12, @ZCMax
v1.0.0rc5 (11/10/2022)¶
New Features¶
Support ImVoxelNet on SUN RGB-D (#1738)
Improvements¶
Fix the cross-codebase reference problem in metafile README (#1644)
Update the Chinese documentation about getting started (#1715)
Fix docs link and add docs link checker (#1811)
Bug Fixes¶
Fix a visualization bug that is potentially triggered by empty prediction labels (#1725)
Fix point cloud segmentation visualization bug due to wrong parameter passing (#1858)
Fix Nan loss bug during PointRCNN training (#1874)
Contributors¶
A total of 9 developers contributed to this release.
@ZwwWayne, @Tai-Wang, @filaPro, @VVsssssk, @ZCMax, @Xiangxu-0103, @holtvogt, @tpoisonooo, @lianqing01
v1.0.0rc4 (8/8/2022)¶
New Features¶
Support FCAF3D (#1547)
Add the transformation to support multi-camera 3D object detection (#1580)
Support lift-splat-shoot view transformer (#1598)
Improvements¶
Remove the limitation of the maximum number of points during SUN RGB-D preprocessing (#1555)
Support circle CI (#1647)
Add mim to extras_require in setup.py (#1560, #1574)
Update dockerfile package version (#1697)
Bug Fixes¶
Flip yaw angle for DepthInstance3DBoxes.overlaps (#1548, #1556)
Fix DGCNN configs (#1587)
Fix bbox head not registered bug (#1625)
Fix missing objects in S3DIS preprocessing (#1665)
Fix spconv2.0 model loading bug (#1699)
Contributors¶
A total of 9 developers contributed to this release.
@Tai-Wang, @ZwwWayne, @filaPro, @lianqing11, @ZCMax, @HuangJunJie2017, @Xiangxu-0103, @ChonghaoSima, @VVsssssk
v1.0.0rc3 (8/6/2022)¶
Improvements¶
Add Chinese documentation for vision-only 3D detection (#1438)
Update CenterPoint pretrained models that are compatible with refactored coordinate systems (#1450)
Configure myst-parser to parse anchor tag in the documentation (#1488)
Replace markdownlint with mdformat for avoiding installing ruby (#1489)
Add missing
gt_names
when getting annotation info in Custom3DDataset (#1519)Support S3DIS full ceph training (#1542)
Rewrite the installation and FAQ documentation (#1545)
Bug Fixes¶
Fix the incorrect registry name when building RoI extractors (#1460)
Fix the potential problems caused by the registry scope update when composing pipelines (#1466) and using CocoDataset (#1536)
Fix the missing selection with
order
in the box3d_nms introduced by #1403 (#1479)Update the PointPillars config to make it consistent with the log (#1486)
Fix heading anchor in documentation (#1490)
Fix the compatibility of mmcv in the dockerfile (#1508)
Make overwrite_spconv packaged when building whl (#1516)
Fix the requirement of mmcv and mmdet (#1537)
Update configs of PartA2 and support its compatibility with spconv 2.0 (#1538)
Contributors¶
A total of 13 developers contributed to this release.
@Xiangxu-0103, @ZCMax, @jshilong, @filaPro, @atinfinity, @Tai-Wang, @wenbo-yu, @yi-chen-isuzu, @ZwwWayne, @wchen61, @VVsssssk, @AlexPasqua, @lianqing11
v1.0.0rc2 (1/5/2022)¶
Highlights¶
Support spconv 2.0
Support MinkowskiEngine with MinkResNet
Support training models on custom datasets with only point clouds
Update Registry to distinguish the scope of built functions
Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes
New Features¶
Add loader arguments in the configuration files (#1388)
Support spconv 2.0 when the package is installed. Users can still use spconv 1.x in MMCV with CUDA 9.0 (only cost more memory) without losing the compatibility of model weights between two versions (#1421)
Support MinkowskiEngine with MinkResNet (#1422)
Improvements¶
Add the documentation for model deployment (#1373, #1436)
Add Chinese documentation of
Speed benchmark (#1379)
LiDAR-based 3D detection (#1368)
LiDAR 3D segmentation (#1420)
Coordinate system refactoring (#1384)
Support training models on custom datasets with only point clouds (#1393)
Replace mmcv.iou3d with a set of bird-eye-view (BEV) operators to unify the operations of rotated boxes (#1403, #1418)
Update Registry to distinguish the scope of building functions (#1412, #1443)
Replace recommonmark with myst_parser for documentation rendering (#1414)
Bug Fixes¶
Fix the show pipeline in the browse_dataset.py (#1376)
Fix missing init files after coordinate system refactoring (#1383)
Fix the incorrect yaw in the visualization caused by coordinate system refactoring (#1407)
Fix
NaiveSyncBatchNorm1d
andNaiveSyncBatchNorm2d
to support non-distributed cases and more general inputs (#1435)
Contributors¶
A total of 11 developers contributed to this release.
@ZCMax, @ZwwWayne, @Tai-Wang, @VVsssssk, @HanaRo, @JoeyforJoy, @ansonlcy, @filaPro, @jshilong, @Xiangxu-0103, @deleomike
v1.0.0rc1 (1/4/2022)¶
Compatibility¶
We migrate all the mmdet3d ops to mmcv and do not need to compile them when installing mmdet3d.
To fix the imprecise timestamp and optimize its saving method, we reformat the point cloud data during Waymo data conversion. The data conversion time is also optimized significantly by supporting parallel processing. Please re-generate KITTI format Waymo data if necessary. See more details in the compatibility documentation.
We update some of the model checkpoints after the refactor of coordinate systems. Please stay tuned for the release of the remaining model checkpoints.
Fully Updated | Partially Updated | In Progress | No Influcence | |
---|---|---|---|---|
SECOND | ✓ | |||
PointPillars | ✓ | |||
FreeAnchor | ✓ | |||
VoteNet | ✓ | |||
H3DNet | ✓ | |||
3DSSD | ✓ | |||
Part-A2 | ✓ | |||
MVXNet | ✓ | |||
CenterPoint | ✓ | |||
SSN | ✓ | |||
ImVoteNet | ✓ | |||
FCOS3D | ✓ | |||
PointNet++ | ✓ | |||
Group-Free-3D | ✓ | |||
ImVoxelNet | ✓ | |||
PAConv | ✓ | |||
DGCNN | ✓ | |||
SMOKE | ✓ | |||
PGD | ✓ | |||
MonoFlex | ✓ |
Highlights¶
Migrate all the mmdet3d ops to mmcv
Support parallel waymo data converter
Add ScanNet instance segmentation dataset with metrics
Better compatibility for windows with CI support, op migration and bug fixes
Support loading annotations from Ceph
New Features¶
Add ScanNet instance segmentation dataset with metrics (#1230)
Support different random seeds for different ranks (#1321)
Support loading annotations from Ceph (#1325)
Support resuming from the latest checkpoint automatically (#1329)
Add windows CI (#1345)
Improvements¶
Update the table format and OpenMMLab project orders in README.md (#1272, #1283)
Migrate all the mmdet3d ops to mmcv (#1240, #1286, #1290, #1333)
Add
with_plane
flag in the KITTI data conversion (#1278)Update instructions and links in the documentation (#1300, 1309, #1319)
Support parallel Waymo dataset converter and ground truth database generator (#1327)
Add quick installation commands to getting_started.md (#1366)
Bug Fixes¶
Update nuimages configs to use new nms config style (#1258)
Fix the usage of np.long for windows compatibility (#1270)
Fix the incorrect indexing in
BasePoints
(#1274)Fix the incorrect indexing in the pillar_scatter.forward_single (#1280)
Fix unit tests that use GPUs (#1301)
Fix incorrect feature dimensions in
DynamicPillarFeatureNet
caused by previous upgrading ofPillarFeatureNet
(#1302)Remove the
CameraPoints
constraint inPointSample
(#1314)Fix imprecise timestamps saving of Waymo dataset (#1327)
Contributors¶
A total of 9 developers contributed to this release.
@ZCMax, @ZwwWayne, @wHao-Wu, @Tai-Wang, @wangruohui, @zjwzcx, @Xiangxu-0103, @EdAyers, @hongye-dev, @zhanggefan
v1.0.0rc0 (18/2/2022)¶
Compatibility¶
We refactor our three coordinate systems to make their rotation directions and origins more consistent, and further remove unnecessary hacks in different datasets and models. Therefore, please re-generate data infos or convert the old version to the new one with our provided scripts. We will also provide updated checkpoints in the next version. Please refer to the compatibility documentation for more details.
Unify the camera keys for consistent transformation between coordinate systems on different datasets. The modification changes the key names to
lidar2img
,depth2img
,cam2img
, etc., for easier understanding. Customized codes using legacy keys may be influenced.The next release will begin to move files of CUDA ops to MMCV. It will influence the way to import related functions. We will not break the compatibility but will raise a warning first and please prepare to migrate it.
Highlights¶
New Features¶
Support 3D object detection on the S3DIS dataset (#835)
Support PointRCNN (#842, #843, #856, #974, #1022, #1109, #1125)
Support DGCNN (#896)
Support PGD (#938, #940, #948, #950, #964, #1014, #1065, #1070, #1157)
Support SMOKE (#939, #955, #959, #975, #988, #999, #1029)
Support MonoFlex (#1026, #1044, #1114, #1115, #1183)
Support CPU Training (#1196)
Improvements¶
Support point sampling based on distance metric (#667, #840)
Refactor coordinate systems (#677, #774, #803, #899, #906, #912, #968, #1001)
Unify camera keys in PointFusion and transformations between different systems (#791, #805)
Refine documentation (#792, #827, #829, #836, #849, #854, #859, #1111, #1113, #1116, #1121, #1132, #1135, #1185, #1193, #1226)
Add a script to support benchmark regression (#808)
Benchmark PAConvCUDA on S3DIS (#847)
Support to download pdf and epub documentation (#850)
Change the
repeat
setting in Group-Free-3D configs to reduce training epochs (#855)Support KITTI AP40 evaluation metric (#927)
Add the mmdet3d2torchserve tool for SECOND (#977)
Add code-spell pre-commit hook and fix typos (#995)
Support the latest numba version (#1043)
Set a default seed to use when the random seed is not specified (#1072)
Distribute mix-precision models to each algorithm folder (#1074)
Add abstract and a representative figure for each algorithm (#1086)
Upgrade pre-commit hook (#1088, #1217)
Support augmented data and ground truth visualization (#1092)
Add local yaw property for
CameraInstance3DBoxes
(#1130)Lock the required numba version to 0.53.0 (#1159)
Support the usage of plane information for KITTI dataset (#1162)
Deprecate the support for “python setup.py test” (#1164)
Reduce the number of multi-process threads to accelerate training (#1168)
Support 3D flip augmentation for semantic segmentation (#1181)
Update README format for each model (#1195)
Bug Fixes¶
Fix compiling errors on Windows (#766)
Fix the deprecated nms setting in the ImVoteNet config (#828)
Use the latest
wrap_fp16_model
import from mmcv (#861)Remove 2D annotations generation on Lyft (#867)
Update index files for the Chinese documentation to be consistent with the English version (#873)
Fix the nested list transpose in the CenterPoint head (#879)
Fix deprecated pretrained model loading for RegNet (#889)
Fix the incorrect dimension indices of rotations and testing config in the CenterPoint test time augmentation (#892)
Fix and improve visualization tools (#956, #1066, #1073)
Fix PointPillars FLOPs calculation error (#1075)
Fix missing dimension information in the SUN RGB-D data generation (#1120)
Fix incorrect anchor range settings in the PointPillars config for KITTI (#1163)
Fix incorrect model information in the RegNet metafile (#1184)
Fix bugs in non-distributed multi-gpu training and testing (#1197)
Fix a potential assertion error when generating corners from an empty box (#1212)
Upgrade bazel version according to the requirement of Waymo Devkit (#1223)
Contributors¶
A total of 12 developers contributed to this release.
@THU17cyz, @wHao-Wu, @wangruohui, @Wuziyi616, @filaPro, @ZwwWayne, @Tai-Wang, @DCNSW, @xieenze, @robin-karlsson0, @ZCMax, @Otteri
v0.18.1 (1/2/2022)¶
Improvements¶
Support Flip3D augmentation in semantic segmentation task (#1182)
Update regnet metafile (#1184)
Add point cloud annotation tools introduction in FAQ (#1185)
Add missing explanations of
cam_intrinsic
in the nuScenes dataset doc (#1193)
Bug Fixes¶
Deprecate the support for “python setup.py test” (#1164)
Fix the rotation matrix while rotation axis=0 (#1182)
Fix the bug in non-distributed multi-gpu training/testing (#1197)
Fix a potential bug when generating corners for empty bounding boxes (#1212)
Contributors¶
A total of 4 developers contributed to this release.
@ZwwWayne, @ZCMax, @Tai-Wang, @wHao-Wu
v0.18.0 (1/1/2022)¶
Highlights¶
Update the required minimum version of mmdet and mmseg
Improvements¶
Use the official markdownlint hook and add codespell hook for pre-committing (#1088)
Improve CI operation (#1095, #1102, #1103)
Use shared menu content from OpenMMLab’s theme and remove duplicated contents from config (#1111)
Refactor the structure of documentation (#1113, #1121)
Update the required minimum version of mmdet and mmseg (#1147)
Bug Fixes¶
Fix symlink failure on Windows (#1096)
Fix the upper bound of mmcv version in the mminstall requirements (#1104)
Fix API documentation compilation and mmcv build errors (#1116)
Fix figure links and pdf documentation compilation (#1132, #1135)
Contributors¶
A total of 4 developers contributed to this release.
@ZwwWayne, @ZCMax, @Tai-Wang, @wHao-Wu
v0.17.3 (1/12/2021)¶
Improvements¶
Change the default show value to
False
in show_result function to avoid unnecessary errors (#1034)Improve the visualization of detection results with colorized points in single_gpu_test (#1050)
Clean unnecessary custom_imports in entrypoints (#1068)
Bug Fixes¶
Update mmcv version in the Dockerfile (#1036)
Fix the memory-leak problem when loading checkpoints in init_model (#1045)
Fix incorrect velocity indexing when formatting boxes on nuScenes (#1049)
Explicitly set cuda device ID in init_model to avoid memory allocation on unexpected devices (#1056)
Fix PointPillars FLOPs calculation error (#1076)
Contributors¶
A total of 5 developers contributed to this release.
@wHao-Wu, @Tai-Wang, @ZCMax, @MilkClouds, @aldakata
v0.17.2 (1/11/2021)¶
Improvements¶
Update Group-Free-3D and FCOS3D bibtex (#985)
Update the solutions for incompatibility of pycocotools in the FAQ (#993)
Add Chinese documentation for the KITTI (#1003) and Lyft (#1010) dataset tutorial
Add the H3DNet checkpoint converter for incompatible keys (#1007)
Bug Fixes¶
Update mmdetection and mmsegmentation version in the Dockerfile (#992)
Fix links in the Chinese documentation (#1015)
Contributors¶
A total of 4 developers contributed to this release.
@Tai-Wang, @wHao-Wu, @ZwwWayne, @ZCMax
v0.17.1 (1/10/2021)¶
Highlights¶
Support a faster but non-deterministic version of hard voxelization
Completion of dataset tutorials and the Chinese documentation
Improved the aesthetics of the documentation format
Improvements¶
Add Chinese documentation for training on customized datasets and designing customized models (#729, #820)
Support a faster but non-deterministic version of hard voxelization (#904)
Update paper titles and code details for metafiles (#917)
Add a tutorial for KITTI dataset (#953)
Use Pytorch sphinx theme to improve the format of documentation (#958)
Use the docker to accelerate CI (#971)
Bug Fixes¶
Fix the sphinx version used in the documentation (#902)
Fix a dynamic scatter bug that discards the first voxel by mistake when all input points are valid (#915)
Fix the inconsistent variable names used in the unit test for voxel generator (#919)
Upgrade to use
build_prior_generator
to replace the legacybuild_anchor_generator
(#941)Fix a minor bug caused by a too small difference set in the FreeAnchor Head (#944)
Contributors¶
A total of 8 developers contributed to this release.
@DCNSW, @zhanggefan, @mickeyouyou, @ZCMax, @wHao-Wu, @tojimahammatov, @xiliu8006, @Tai-Wang
v0.17.0 (1/9/2021)¶
Compatibility¶
Unify the camera keys for consistent transformation between coordinate systems on different datasets. The modification change the key names to
lidar2img
,depth2img
,cam2img
, etc. for easier understanding. Customized codes using legacy keys may be influenced.The next release will begin to move files of CUDA ops to MMCV. It will influence the way to import related functions. We will not break the compatibility but will raise a warning first and please prepare to migrate it.
Highlights¶
Support 3D object detection on the S3DIS dataset
Support compilation on Windows
Full benchmark for PAConv on S3DIS
Further enhancement for documentation, especially on the Chinese documentation
New Features¶
Support 3D object detection on the S3DIS dataset (#835)
Improvements¶
Support point sampling based on distance metric (#667, #840)
Update PointFusion to support unified camera keys (#791)
Add Chinese documentation for customized dataset (#792), data pipeline (#827), customized runtime (#829), 3D Detection on ScanNet (#836), nuScenes (#854) and Waymo (#859)
Unify camera keys used in transformation between different systems (#805)
Add a script to support benchmark regression (#808)
Benchmark PAConvCUDA on S3DIS (#847)
Add a tutorial for 3D detection on the Lyft dataset (#849)
Support to download pdf and epub documentation (#850)
Change the
repeat
setting in Group-Free-3D configs to reduce training epochs (#855)
Bug Fixes¶
Fix compiling errors on Windows (#766)
Fix the deprecated nms setting in the ImVoteNet config (#828)
Use the latest
wrap_fp16_model
import from mmcv (#861)Remove 2D annotations generation on Lyft (#867)
Update index files for the Chinese documentation to be consistent with the English version (#873)
Fix the nested list transpose in the CenterPoint head (#879)
Fix deprecated pretrained model loading for RegNet (#889)
Contributors¶
A total of 11 developers contributed to this release.
@THU17cyz, @wHao-Wu, @wangruohui, @Wuziyi616, @filaPro, @ZwwWayne, @Tai-Wang, @DCNSW, @xieenze, @robin-karlsson0, @ZCMax
v0.16.0 (1/8/2021)¶
Compatibility¶
Remove the rotation and dimension hack in the monocular 3D detection on nuScenes by applying corresponding transformation in the pre-processing and post-processing. The modification only influences nuScenes coco-style json files. Please re-run the data preparation scripts if necessary. See more details in the PR #744.
Add a new pre-processing module for the ScanNet dataset in order to support multi-view detectors. Please run the updated scripts to extract the RGB data and its annotations. See more details in the PR #696.
Highlights¶
Support to use MIM with pip installation
Support PAConv models and benchmarks on S3DIS
Enhance the documentation especially on dataset tutorials
New Features¶
Support RGB images on ScanNet for multi-view detectors (#696)
Support FLOPs and number of parameters calculation (#736)
Support to use MIM with pip installation (#782)
Support PAConv models and benchmarks on the S3DIS dataset (#783, #809)
Improvements¶
Refactor Group-Free-3D to make it inherit BaseModule from MMCV (#704)
Modify the initialization methods of FCOS3D to be consistent with the refactored approach (#705)
Benchmark the Group-Free-3D models on ScanNet (#710)
Add Chinese documentation for Getting Started (#725), FAQ (#730), Model Zoo (#735), Demo (#745), Quick Run (#746), Data Preparation (#787) and Configs (#788)
Add documentation for semantic segmentation on ScanNet and S3DIS (#743, #747, #806, #807)
Add a parameter
max_keep_ckpts
to limit the maximum number of saved Group-Free-3D checkpoints (#765)Add documentation for 3D detection on SUN RGB-D and nuScenes (#770, #793)
Remove mmpycocotools in the Dockerfile (#785)
Bug Fixes¶
Fix versions of OpenMMLab dependencies (#708)
Convert
rt_mat
totorch.Tensor
in coordinate transformation for compatibility (#709)Fix the
bev_range
initialization inObjectRangeFilter
according to thegt_bboxes_3d
type (#717)Fix Chinese documentation and incorrect doc format due to the incompatible Sphinx version (#718)
Fix a potential bug when setting
interval == 1
in analyze_logs.py (#720)Update the structure of Chinese documentation (#722)
Fix FCOS3D FPN BC-Breaking caused by the code refactoring in MMDetection (#739)
Fix wrong
in_channels
whenwith_distance=True
in the Dynamic VFE Layers (#749)Fix the dimension and yaw hack of FCOS3D on nuScenes (#744, #794, #795, #818)
Fix the missing default
bbox_mode
in theshow_multi_modality_result
(#825)
Contributors¶
A total of 12 developers contributed to this release.
@yinchimaoliang, @gopi231091, @filaPro, @ZwwWayne, @ZCMax, @hjin2902, @wHao-Wu, @Wuziyi616, @xiliu8006, @THU17cyz, @DCNSW, @Tai-Wang
v0.15.0 (1/7/2021)¶
Compatibility¶
In order to fix the problem that the priority of EvalHook is too low, all hook priorities have been re-adjusted in 1.3.8, so MMDetection 2.14.0 needs to rely on the latest MMCV 1.3.8 version. For related information, please refer to #1120, for related issues, please refer to #5343.
Highlights¶
Support PAConv
Support monocular/multi-view 3D detector ImVoxelNet on KITTI
Support Transformer-based 3D detection method Group-Free-3D on ScanNet
Add documentation for tasks including LiDAR-based 3D detection, vision-only 3D detection and point-based 3D semantic segmentation
Add dataset documents like ScanNet
New Features¶
Support Group-Free-3D on ScanNet (#539)
Support PAConv modules (#598, #599)
Support ImVoxelNet on KITTI (#627, #654)
Improvements¶
Add unit tests for pipeline functions
LoadImageFromFileMono3D
,ObjectNameFilter
andObjectRangeFilter
(#615)Enhance IndoorPatchPointSample (#617)
Refactor model initialization methods based MMCV (#622)
Add Chinese docs (#629)
Add documentation for LiDAR-based 3D detection (#642)
Unify intrinsic and extrinsic matrices for all datasets (#653)
Add documentation for point-based 3D semantic segmentation (#663)
Add documentation of ScanNet for 3D detection (#664)
Refine docs for tutorials (#666)
Add documentation for vision-only 3D detection (#669)
Refine docs for Quick Run and Useful Tools (#686)
Bug Fixes¶
Fix the bug of BackgroundPointsFilter using the bottom center of ground truth (#609)
Fix LoadMultiViewImageFromFiles to unravel stacked multi-view images to list to be consistent with DefaultFormatBundle (#611)
Fix the potential bug in analyze_logs when the training resumes from a checkpoint or is stopped before evaluation (#634)
Fix test commands in docs and make some refinements (#635)
Fix wrong config paths in unit tests (#641)
v0.14.0 (1/6/2021)¶
Highlights¶
Support the point cloud segmentation method PointNet++
New Features¶
Support PointNet++ (#479, #528, #532, #541)
Support RandomJitterPoints transform for point cloud segmentation (#584)
Support RandomDropPointsColor transform for point cloud segmentation (#585)
Improvements¶
Move the point alignment of ScanNet from data pre-processing to pipeline (#439, #470)
Add compatibility document to provide detailed descriptions of BC-breaking changes (#504)
Add MMSegmentation installation requirement (#535)
Support points rotation even without bounding box in GlobalRotScaleTrans for point cloud segmentaiton (#540)
Support visualization of detection results and dataset browse for nuScenes Mono-3D dataset (#542, #582)
Support faster implementation of KNN (#586)
Support RegNetX models on Lyft dataset (#589)
Remove a useless parameter
label_weight
from segmentation datasets includingCustom3DSegDataset
,ScanNetSegDataset
andS3DISSegDataset
(#607)
Bug Fixes¶
Fix a corrupted lidar data file in Lyft dataset in data_preparation (#546)
Fix evaluation bugs in nuScenes and Lyft dataset (#549)
Fix converting points between coordinates with specific transformation matrix in the coord_3d_mode.py (#556)
Support PointPillars models on Lyft dataset (#578)
Fix the bug of demo with pre-trained VoteNet model on ScanNet (#600)
v0.13.0 (1/5/2021)¶
Highlights¶
Support a monocular 3D detection method FCOS3D
Support ScanNet and S3DIS semantic segmentation dataset
Enhancement of visualization tools for dataset browsing and demos, including support of visualization for multi-modality data and point cloud segmentation.
New Features¶
Support ScanNet semantic segmentation dataset (#390)
Support monocular 3D detection on nuScenes (#392)
Support multi-modality visualization (#405)
Support nuimages visualization (#408)
Support monocular 3D detection on KITTI (#415)
Support online visualization of semantic segmentation results (#416)
Support ScanNet test results submission to online benchmark (#418)
Support S3DIS data pre-processing and dataset class (#433)
Support FCOS3D (#436, #442, #482, #484)
Support dataset browse for multiple types of datasets (#467)
Adding paper-with-code (PWC) metafile for each model in the model zoo (#485)
Improvements¶
Support dataset browsing for SUNRGBD, ScanNet or KITTI points and detection results (#367)
Add the pipeline to load data using file client (#430)
Support to customize the type of runner (#437)
Make pipeline functions process points and masks simultaneously when sampling points (#444)
Add waymo unit tests (#455)
Split the visualization of projecting points onto image from that for only points (#480)
Efficient implementation of PointSegClassMapping (#489)
Use the new model registry from mmcv (#495)
Bug Fixes¶
Fix Pytorch 1.8 Compilation issue in the scatter_points_cuda.cu (#404)
Fix dynamic_scatter errors triggered by empty point input (#417)
Fix the bug of missing points caused by using break incorrectly in the voxelization (#423)
Fix the missing
coord_type
in the waymo dataset config (#441)Fix errors in four unittest functions of configs, test_detectors.py, test_heads.py (#453)
Fix 3DSSD training errors and simplify configs (#462)
Clamp 3D votes projections to image boundaries in ImVoteNet (#463)
Update out-of-date names of pipelines in the config of pointpillars benchmark (#474)
Fix the lack of a placeholder when unpacking RPN targets in the h3d_bbox_head.py (#508)
Fix the incorrect value of
K
when creating pickle files for SUN RGB-D (#511)
v0.12.0 (1/4/2021)¶
Highlights¶
New Features¶
Support LiDAR-based semantic segmentation metrics (#332)
Support ImVoteNet (#352, #384)
Support the KNN GPU operation (#360, #371)
Improvements¶
Add FAQ for common problems in the documentation (#333)
Refactor the structure of tools (#339)
Support demo on nuScenes (#353)
Add 3DSSD checkpoints (#359)
Update the Bibtex of CenterPoint (#368)
Add citation format and reference to other OpenMMLab projects in the README (#374)
Upgrade the mmcv version requirements (#376)
Add numba and numpy version requirements in FAQ (#379)
Avoid unnecessary for-loop execution of vfe layer creation (#389)
Update SUNRGBD dataset documentation to stress the requirements for training ImVoteNet (#391)
Modify vote head to support 3DSSD (#396)
Bug Fixes¶
Fix missing keys
coord_type
in database sampler config (#345)Rename H3DNet configs (#349)
Fix CI by using ubuntu 18.04 in github workflow (#350)
Add assertions to avoid 4-dim points being input to points_in_boxes (#357)
Fix the SECOND results on Waymo in the corresponding README (#363)
Fix the incorrect adopted pipeline when adding val to workflow (#370)
Fix a potential bug when indices used in the backwarding in ThreeNN (#377)
Fix a compilation error triggered by scatter_points_cuda.cu in PyTorch 1.7 (#393)
v0.11.0 (1/3/2021)¶
Highlights¶
Support more friendly visualization interfaces based on open3d
Support a faster and more memory-efficient implementation of DynamicScatter
Refactor unit tests and details of configs
New Features¶
Support new visualization methods based on open3d (#284, #323)
Improvements¶
Refactor unit tests (#303)
Move the key
train_cfg
andtest_cfg
into the model configs (#307)Update README with Chinese version and instructions for getting started. (#310, #316)
Support a faster and more memory-efficient implementation of DynamicScatter (#318, #326)
Bug Fixes¶
Fix an unsupported bias setting in the unit test for centerpoint head (#304)
Fix errors due to typos in the centerpoint head (#308)
Fix a minor bug in points_in_boxes.py when tensors are not in the same device. (#317)
Fix warning of deprecated usages of nonzero during training with PyTorch 1.6 (#330)
v0.10.0 (1/2/2021)¶
Highlights¶
Preliminary release of API for SemanticKITTI dataset.
Documentation and demo enhancement for better user experience.
Fix a number of underlying minor bugs and add some corresponding important unit tests.
New Features¶
Support SemanticKITTI dataset preliminarily (#287)
Improvements¶
Add tag to README in configurations for specifying different uses (#262)
Update instructions for evaluation metrics in the documentation (#265)
Add nuImages entry in README.md and gif demo (#266, #268)
Add unit test for voxelization (#275)
Bug Fixes¶
Fixed the issue of unpacking size in furthest_point_sample.py (#248)
Fix bugs for 3DSSD triggered by empty ground truths (#258)
Remove models without checkpoints in model zoo statistics of documentation (#259)
Fix some unclear installation instructions in getting_started.md (#269)
Fix relative paths/links in the documentation (#271)
Fix a minor bug in scatter_points_cuda.cu when num_features != 4 (#275)
Fix the bug about missing text files when testing on KITTI (#278)
Fix issues caused by inplace modification of tensors in
BaseInstance3DBoxes
(#283)Fix log analysis for evaluation and adjust the documentation accordingly (#285)
v0.9.0 (31/12/2020)¶
Highlights¶
Documentation refactoring with better structure, especially about how to implement new models and customized datasets.
More compatible with refactored point structure by bug fixes in ground truth sampling.
Improvements¶
Documentation refactoring (#242)
Bug Fixes¶
Fix point structure related bugs in ground truth sampling (#211)
Fix loading points in ground truth sampling augmentation on nuScenes (#221)
Fix channel setting in the SeparateHead of CenterPoint (#228)
Fix evaluation for indoors 3D detection in case of less classes in prediction (#231)
Remove unreachable lines in nuScenes data converter (#235)
Minor adjustments of numpy implementation for perspective projection and prediction filtering criterion in KITTI evaluation (#241)
v0.8.0 (30/11/2020)¶
Highlights¶
Refactor points structure with more constructive and clearer implementation.
Support axis-aligned IoU loss for VoteNet with better performance.
Update and enhance SECOND benchmark on Waymo.
New Features¶
Support axis-aligned IoU loss for VoteNet. (#194)
Support points structure for consistent processing of all the point related representation. (#196, #204)
v0.7.0 (1/11/2020)¶
Highlights¶
Support a new method SSN with benchmarks on nuScenes and Lyft datasets.
Update benchmarks for SECOND on Waymo, CenterPoint with TTA on nuScenes and models with mixed precision training on KITTI and nuScenes.
Support semantic segmentation on nuImages and provide HTC models with configurations and performance for reference.
New Features¶
Modified primitive head which can support the setting on SUN-RGBD dataset (#136)
Support semantic segmentation and HTC with models for reference on nuImages dataset (#155)
Support SSN on nuScenes and Lyft datasets (#147, #174, #166, #182)
Support double flip for test time augmentation of CenterPoint with updated benchmark (#143)
Improvements¶
Update SECOND benchmark with configurations for reference on Waymo (#166)
Delete checkpoints on Waymo to comply its specific license agreement (#180)
Update models and instructions with mixed precision training on KITTI and nuScenes (#178)
Bug Fixes¶
Fix incorrect code weights in anchor3d_head when introducing mixed precision training (#173)
Fix the incorrect label mapping on nuImages dataset (#155)
v0.6.1 (11/10/2020)¶
Highlights¶
Support mixed precision training of voxel-based methods
Support docker with PyTorch 1.6.0
Update baseline configs and results (CenterPoint on nuScenes and PointPillars on Waymo with full dataset)
Switch model zoo to download.openmmlab.com
New Features¶
Support dataset pipeline
VoxelBasedPointSampler
to sample multi-sweep points based on voxelization. (#125)Support mixed precision training of voxel-based methods (#132)
Support docker with PyTorch 1.6.0 (#160)
Improvements¶
Reduce requirements for the case exclusive of Waymo (#121)
Switch model zoo to download.openmmlab.com (#126)
Update docs related to Waymo (#128)
Add version assertion in the init file (#129)
Add evaluation interval setting for CenterPoint (#131)
Add unit test for CenterPoint (#133)
Update PointPillars baselines on Waymo with full dataset (#142)
Update CenterPoint results with models and logs (#154)
Bug Fixes¶
Fix a bug of visualization in multi-batch case (#120)
Fix bugs in dcn unit test (#130)
Fix dcn bias bug in centerpoint (#137)
Fix dataset mapping in the evaluation of nuScenes mini dataset (#140)
Fix origin initialization in
CameraInstance3DBoxes
(#148, #150)Correct documentation link in the getting_started.md (#159)
Fix model save path bug in gather_models.py (#153)
Fix image padding shape bug in
PointFusion
(#162)
v0.6.0 (20/9/2020)¶
Highlights¶
Support new methods H3DNet, 3DSSD, CenterPoint.
Support new dataset Waymo (with PointPillars baselines) and nuImages (with Mask R-CNN and Cascade Mask R-CNN baselines).
Support Batch Inference
Support Pytorch 1.6
Start to publish
mmdet3d
package to PyPI since v0.5.0. You can use mmdet3d throughpip install mmdet3d
.
Backwards Incompatible Changes¶
Support Batch Inference (#95, #103, #116): MMDetection3D v0.6.0 migrates to support batch inference based on MMDetection >= v2.4.0. This change influences all the test APIs in MMDetection3D and downstream codebases.
Start to use collect environment function from MMCV (#113): MMDetection3D v0.6.0 migrates to use
collect_env
function in MMCV.get_compiler_version
andget_compiling_cuda_version
compiled inmmdet3d.ops.utils
are removed. Please import these two functions frommmcv.ops
.
New Features¶
Support nuImages dataset by converting them into coco format and release Mask R-CNN and Cascade Mask R-CNN baseline models (#91, #94)
Support to publish to PyPI in github-action (#17, #19, #25, #39, #40)
Support CBGSDataset and make it generally applicable to all the supported datasets (#75, #94)
Support H3DNet and release models on ScanNet dataset (#53, #58, #105)
Support Fusion Point Sampling used in 3DSSD (#66)
Add
BackgroundPointsFilter
to filter background points in data pipeline (#84)Support pointnet2 with multi-scale grouping in backbone and refactor pointnets (#82)
Support dilated ball query used in 3DSSD (#96)
Support 3DSSD and release models on KITTI dataset (#83, #100, #104)
Support CenterPoint and release models on nuScenes dataset (#49, #92)
Support Waymo dataset and release PointPillars baseline models (#118)
Allow
LoadPointsFromMultiSweeps
to pad empty sweeps and select multiple sweeps randomly (#67)
Improvements¶
Fix all warnings and bugs in PyTorch 1.6.0 (#70, #72)
Update issue templates (#43)
Update unit tests (#20, #24, #30)
Update documentation for using
ply
format point cloud data (#41)Use points loader to load point cloud data in ground truth (GT) samplers (#87)
Unify version file of OpenMMLab projects by using
version.py
(#112)Remove unnecessary data preprocessing commands of SUN RGB-D dataset (#110)
Bug Fixes¶
Rename CosineAnealing to CosineAnnealing (#57)
Fix device inconsistent bug in 3D IoU computation (#69)
Fix a minor bug in json2csv of lyft dataset (#78)
Add missed test data for pointnet modules (#85)
Fix
use_valid_flag
bug inCustomDataset
(#106)
v0.5.0 (9/7/2020)¶
MMDetection3D is released.
Changelog of v1.1¶
v1.1.0 (6/4/2023)¶
Highlights¶
New Features¶
Support Cylinder3D (#2291, #2344, #2350)
Support MinkUnet (#2294, #2358)
Support SPVCNN (#2320,#2372)
Support TR3D detector in
projects
(#2274)Support the inference of BEVFusion in
projects
(#2175)Support DETR3D in
projects
(#2173)Support PolarMix and LaserMix augmentation (#2265, #2302)
Support loading annotation of panoptic segmentation (#2223)
Support panoptic segmentation metric (#2230)
Add inferencer for LiDAR-based, monocular and multi-modality 3D detection (#2208, #2190, #2342)
Add inferencer for LiDAR-based segmentation (#2304)
Improvements¶
Support
lazy_init
for CBGSDataset (#2271)Support generating annotation files for test set on Waymo (#2180)
Enhance the support for SemanticKitti (#2253, #2323)
File I/O migration and reconstruction (#2319)
Support
format_only
option for Lyft, NuScenes and Waymo datasets (#2333, #2151)Replace
np.transpose
withtorch.permute
to speed up (#2277)Allow setting local-rank for pytorch 2.0 (#2387)
Bug Fixes¶
Fix the problem of reversal of length and width when drawing heatmap in CenterFormer (#2362)
Deprecate old type alias due to the new version of numpy (#2339)
Lose
trimesh
version requirements to fix numpy random state (#2340)Fix the device mismatch error in CenterPoint (#2308)
Fix bug of visualization when there are no bboxes (#2231)
Fix bug of counting ignore index in IOU in segmentation evaluation (#2229)
Contributors¶
A total of 14 developers contributed to this release.
@ZLTJohn, @SekiroRong, @shufanwu, @vansin, @triple-Mu, @404Vector, @filaPro, @sunjiahao1999, @Ginray, @Xiangxu-0103, @JingweiZhang12, @DezeZhao, @ZCMax, @roger-lcc
v1.1.0rc3 (7/1/2023)¶
Highlights¶
Support CenterFormer in
projects
(#2175)Support PETR in
projects
(#2173)
New Features¶
Support CenterFormer in
projects
(#2175)Support PETR in
projects
(#2173)Refactor ImVoxelNet on SUN RGB-D into mmdet3d v1.1 (#2141)
Improvements¶
Remove legacy builder.py (#2061)
Update
customize_dataset
documentation (#2153)Update tutorial of LiDAR-based detection (#2120)
Bug Fixes¶
Fix the configs of FCOS3D and PGD (#2191)
Fix numpy’s
ValueError
in update_infos_to_v2.py (#2162)Fix parameter missing in Det3DVisualizationHook (#2118)
Fix memory overflow in the rotated box IoU calculation (#2134)
Fix lidar2cam error in update_infos_to_v2.py for nus and lyft dataset (#2110)
Fix error of data type in Waymo metrics (#2109)
Update
bbox_3d
information incam_instances
for mono3d detection task (#2046)Fix label saving of Waymo dataset (#2096)
Contributors¶
A total of 10 developers contributed to this release.
@SekiroRong, @ZLTJohn, @vansin, @shanmo, @VVsssssk, @ZCMax, @Xiangxu-0103, @JingweiZhang12, @Tai-Wang, @lianqing11
v1.1.0rc2 (2/12/2022)¶
New Features¶
Support PV-RCNN (#1597, #2045)
Speed up evaluation on Waymo dataset (#2008)
Refactor FCAF3D into the framework of mmdet3d v1.1 (#1945)
Refactor S3DIS dataset into the framework of mmdet3d v1.1 (#1984)
Add
Projects/
folder and the first example project (#2042)
Improvements¶
Rename
CLASSES
andPALETTE
toclasses
andpalette
respectively (#1932)Update
metainfo
in pkl files and addcategories
into metainfo (#1934)Show instance statistics before and after through the pipeline (#1863)
Add configs of DGCNN for different testing areas (#1967)
Remove testing utils from
tests/utils/
tommdet3d/testing/
(#2012)Add typehint for code in
models/layers/
(#2014)Refine documentation (#1891, #1994)
Refine voxelization for better speed (#2062)
Bug Fixes¶
Fix loop visualization error about point cloud (#1914)
Fix image conversion of Waymo to avoid information loss (#1979)
Fix evaluation on KITTI testset (#2005)
Fix sampling bug in
IoUNegPiecewiseSampler
(#2017)Fix point cloud range in CenterPoint (#1998)
Fix some loading bugs and support FOV-image-based mode on Waymo dataset (#1942)
Fix dataset conversion utils (#1923, #2040, #1971)
Update metafiles in all the configs (#2006)
Contributors¶
A total of 12 developers contributed to this release.
@vavanade, @oyel, @thinkthinking, @PeterH0323, @274869388, @cxiang26, @lianqing11, @VVsssssk, @ZCMax, @Xiangxu-0103, @JingweiZhang12, @Tai-Wang
v1.1.0rc1 (11/10/2022)¶
Highlights¶
Support a camera-only 3D detection baseline on Waymo, MV-FCOS3D++
New Features¶
Support a camera-only 3D detection baseline on Waymo, MV-FCOS3D++, with new evaluation metrics and transformations (#1716)
Refactor PointRCNN in the framework of mmdet3d v1.1 (#1819)
Improvements¶
Add
auto_scale_lr
in config to support training with auto-scale learning rates (#1807)Fix CI (#1813, #1865, #1877)
Update
browse_dataset.py
script (#1817)Update SUN RGB-D and Lyft datasets documentation (#1833)
Rename
convert_to_datasample
toadd_pred_to_datasample
in detectors (#1843)Update customized dataset documentation (#1845)
Update
Det3DLocalVisualization
and visualization documentation (#1857)Add the code of generating
cam_sync_labels
for Waymo dataset (#1870)Update dataset transforms typehints (#1875)
Bug Fixes¶
Fix missing registration of models in setup_env.py (#1808)
Fix the data base sampler bugs when using the ground plane data (#1812)
Add output directory existing check during visualization (#1828)
Fix bugs of nuScenes dataset for monocular 3D detection (#1837)
Fix visualization hook to support the visualization of different data modalities (#1839)
Fix monocular 3D detection demo (#1864)
Fix the lack of
num_pts_feats
key in nuscenes dataset and complete docstring (#1882)
Contributors¶
A total of 10 developers contributed to this release.
@ZwwWayne, @Tai-Wang, @lianqing11, @VVsssssk, @ZCMax, @Xiangxu-0103, @JingweiZhang12, @tpoisonooo, @ice-tong, @jshilong
v1.1.0rc0 (1/9/2022)¶
We are excited to announce the release of MMDetection3D 1.1.0rc0. MMDet3D 1.1.0rc0 is the first version of MMDetection3D 1.1, a part of the OpenMMLab 2.0 projects. Built upon the new training engine and MMDet 3.x, MMDet3D 1.1 unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed. It also provides a standard data protocol for different datasets, modalities, and tasks for 3D perception. We will support more strong baselines in the future release, with our latest exploration on camera-only 3D detection from videos.
Highlights¶
New engines. MMDet3D 1.1 is based on MMEngine and MMDet 3.x, which provides a universal and powerful runner that allows more flexible customizations and significantly simplifies the entry points of high-level interfaces.
Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMDet3D 1.1 unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
Standard data protocol for all the datasets, modalities, and tasks for 3D perception. Based on the unified base datasets inherited from MMEngine, we also design a standard data protocol that defines and unifies the common keys across different datasets, tasks, and modalities. It significantly simplifies the usage of multiple datasets and data modalities for multi-task frameworks and eases dataset customization. Please refer to the documentation of customized datasets for details.
Strong baselines. We will release strong baselines of many popular models to enable fair comparisons among state-of-the-art models.
More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.
Breaking Changes¶
MMDet3D 1.1 has undergone significant changes to have better design, higher efficiency, more flexibility, and more unified interfaces. Besides the changes of API, we briefly list the major breaking changes in this section. We will update the migration guide to provide complete details and migration instructions. Users can also refer to the compatibility documentation and API doc for more details.
Dependencies¶
MMDet3D 1.1 runs on PyTorch>=1.6. We have deprecated the support of PyTorch 1.5 to embrace the mixed precision training and other new features since PyTorch 1.6. Some models can still run on PyTorch 1.5, but the full functionality of MMDet3D 1.1 is not guaranteed.
MMDet3D 1.1 relies on MMEngine to run. MMEngine is a new foundational library for training deep learning models of OpenMMLab and are widely depended by OpenMMLab 2.0 projects. The dependencies of file IO and training are migrated from MMCV 1.x to MMEngine.
MMDet3D 1.1 relies on MMCV>=2.0.0rc0. Although MMCV no longer maintains the training functionalities since 2.0.0rc0, MMDet3D 1.1 relies on the data transforms, CUDA operators, and image processing interfaces in MMCV. Note that the package
mmcv
is the version that provides pre-built CUDA operators andmmcv-lite
does not since MMCV 2.0.0rc0, whilemmcv-full
has been deprecated since 2.0.0rc0.MMDet3D 1.1 is based on MMDet 3.x, which is also a part of OpenMMLab 2.0 projects.
Training and testing¶
MMDet3D 1.1 uses Runner in MMEngine rather than that in MMCV. The new Runner implements and unifies the building logic of dataset, model, evaluation, and visualizer. Therefore, MMDet3D 1.1 no longer relies on the building logics of those modules in
mmdet3d.train.apis
andtools/train.py
. Those code have been migrated into MMEngine. Please refer to the migration guide of Runner in MMEngine for more details.The Runner in MMEngine also supports testing and validation. The testing scripts are also simplified, which has similar logic as that in training scripts to build the runner.
The execution points of hooks in the new Runner have been enriched to allow more flexible customization. Please refer to the migration guide of Hook in MMEngine for more details.
Learning rate and momentum scheduling has been migrated from Hook to Parameter Scheduler in MMEngine. Please refer to the migration guide of Parameter Scheduler in MMEngine for more details.
Configs¶
The Runner in MMEngine uses a different config structure to ease the understanding of the components in runner. Users can read the config example of MMDet3D 1.1 or refer to the migration guide in MMEngine for migration details.
The file names of configs and models are also refactored to follow the new rules unified across OpenMMLab 2.0 projects. The names of checkpoints are not updated for now as there is no BC-breaking of model weights between MMDet3D 1.1 and 1.0.x. We will progressively replace all the model weights by those trained in MMDet3D 1.1. Please refer to the user guides of config for more details.
Dataset¶
The Dataset classes implemented in MMDet3D 1.1 all inherits from the Det3DDataset
and Seg3DDataset
, which inherits from the BaseDataset in MMEngine. In addition to the changes of interfaces, there are several changes of Dataset in MMDet3D 1.1.
All the datasets support to serialize the internal data list to reduce the memory when multiple workers are built for data loading.
The internal data structure in the dataset is changed to be self-contained (without losing information like class names in MMDet3D 1.0.x) while keeping simplicity.
Common keys across different datasets and data modalities are defined and all the info files are unified into a standard protocol.
The evaluation functionality of each dataset has been removed from dataset so that some specific evaluation metrics like KITTI AP can be used to evaluate the prediction on other datasets.
Data Transforms¶
The data transforms in MMDet3D 1.1 all inherits from BaseTransform
in MMCV>=2.0.0rc0, which defines a new convention in OpenMMLab 2.0 projects.
Besides the interface changes, there are several changes listed as below:
The functionality of some data transforms (e.g.,
Resize
) are decomposed into several transforms to simplify and clarify the usages.The format of data dict processed by each data transform is changed according to the new data structure of dataset.
Some inefficient data transforms (e.g., normalization and padding) are moved into data preprocessor of model to improve data loading and training speed.
The same data transforms in different OpenMMLab 2.0 libraries have the same augmentation implementation and the logic given the same arguments, i.e.,
Resize
in MMDet 3.x and MMSeg 1.x will resize the image in the exact same manner given the same arguments.
Model¶
The models in MMDet3D 1.1 all inherits from BaseModel
in MMEngine, which defines a new convention of models in OpenMMLeb 2.0 projects.
Users can refer to the tutorial of model in MMengine for more details.
Accordingly, there are several changes as the following:
The model interfaces, including the input and output formats, are significantly simplified and unified following the new convention in MMDet3D 1.1. Specifically, all the input data in training and testing are packed into
inputs
anddata_samples
, whereinputs
contains model inputs like a dict contain a list of image tensors and the point cloud data, anddata_samples
contains other information of the current data sample such as ground truths, region proposals, and model predictions. In this way, different tasks in MMDet3D 1.1 can share the same input arguments, which makes the models more general and suitable for multi-task learning and some flexible training paradigms like semi-supervised learning.The model has a data preprocessor module, which are used to pre-process the input data of model. In MMDet3D 1.1, the data preprocessor usually does necessary steps to form the input images into a batch, such as padding. It can also serve as a place for some special data augmentations or more efficient data transformations like normalization.
The internal logic of model have been changed. In MMDet3D 1.1, model uses
forward_train
,forward_test
,simple_test
, andaug_test
to deal with different model forward logics. In MMDet3D 1.1 and OpenMMLab 2.0, the forward function has three modes: ‘loss’, ‘predict’, and ‘tensor’ for training, inference, and tracing or other purposes, respectively. The forward function callsself.loss
,self.predict
, andself._forward
given the modes ‘loss’, ‘predict’, and ‘tensor’, respectively.
Evaluation¶
The evaluation in MMDet3D 1.0.x strictly binds with the dataset. In contrast, MMDet3D 1.1 decomposes the evaluation from dataset, so that all the detection dataset can evaluate with KITTI AP and other metrics implemented in MMDet3D 1.1. MMDet3D 1.1 mainly implements corresponding metrics for each dataset, which are manipulated by Evaluator to complete the evaluation. Users can build evaluator in MMDet3D 1.1 to conduct offline evaluation, i.e., evaluate predictions that may not produced in MMDet3D 1.1 with the dataset as long as the dataset and the prediction follows the dataset conventions. More details can be find in the tutorial in mmengine.
Visualization¶
The functions of visualization in MMDet3D 1.1 are removed. Instead, in OpenMMLab 2.0 projects, we use Visualizer to visualize data. MMDet3D 1.1 implements Det3DLocalVisualizer
to allow visualization of 2D and 3D data, ground truths, model predictions, and feature maps, etc., at any place. It also supports to send the visualization data to any external visualization backends such as Tensorboard.
Planned changes¶
We list several planned changes of MMDet3D 1.1.0rc0 so that the community could more comprehensively know the progress of MMDet3D 1.1. Feel free to create a PR, issue, or discussion if you are interested, have any suggestions and feedbacks, or want to participate.
Test-time augmentation: which is supported in MMDet3D 1.0.x, is not implemented in this version due to limited time slot. We will support it in the following releases with a new and simplified design.
Inference interfaces: a unified inference interfaces will be supported in the future to ease the use of released models.
Interfaces of useful tools that can be used in notebook: more useful tools that implemented in the
tools
directory will have their python interfaces so that they can be used through notebook and in downstream libraries.Documentation: we will add more design docs, tutorials, and migration guidance so that the community can deep dive into our new design, participate the future development, and smoothly migrate downstream libraries to MMDet3D 1.1.
Wandb visualization: MMDet 2.x supports data visualization since v2.25.0, which has not been migrated to MMDet 3.x for now. Since Wandb provides strong visualization and experiment management capabilities, a
DetWandbVisualizer
and maybe a hook are planned to fully migrated those functionalities in MMDet 2.x and aDet3DWandbVisualizer
will be supported in MMDet3D 1.1 accordingly.Will support recent new features added in MMDet3D 1.0.x and our recent exploration on camera-only 3D detection from videos: we will refactor these models and support them with benchmarks and models soon.
Compatibility¶
v1.1.0rc0¶
OpenMMLab v2.0 Refactoring¶
In this version, we make large refactoring based on MMEngine to achieve unified data elements, model interfaces, visualizers, evaluators and other runtime modules across different datasets, tasks and even codebases. A brief summary for this refactoring is as follows:
Data Element:
We add
Det3DDataSample
as the common data element passing through datasets and models. It inherits fromDetDataSample
in mmdetection and implementInstanceData
,PixelData
, andLabelData
inheriting fromBaseDataElement
in MMEngine to represent different types of ground truth labels or predictions.
Datasets:
We add
Det3DDataset
andSeg3DDataset
as the base datasets to inherit from the unifiedBaseDataset
in MMEngine. They implement most functions that are commonly used across different datasets and simplify the info loading/processing in the current datasets. Re-defined input arguments and functions can be most re-used in different datasets, which are important for the implementation of customized datasets.We define the common keys across different datasets and unify all the info files with a standard protocol. The same info is more clear for users because they share the same key across different dataset infos. Besides, for different settings, such as camera-only and LiDAR-only methods, we no longer need different info formats (like the previous pkl and json files). We can just revise the
parse_data_info
to read the necessary information from a complete info file.We add
train_dataloader
,val_dataloader
andtest_dataloader
to replace the originaldata
in the config. It simplify the levels of data-related fields.
Data Transforms
Based on the basic transforms and wrappers re-implemented and simplified in the latest MMCV, we refactor data transforms to inherit from them.
We also adjust the implementation of current data pipelines to make them compatible with our latest data protocol.
Normalization, padding of images and voxelization operations are moved to the data-preprocessing.
DefaultFormatBundle3D
andCollect3D
are replaced withPackDet3DInputs
to pack the data into the element format as model input.
Models
Unify the model interface as
inputs
,data_samples
,return_loss=False
The basic pre-processing before model forward includes: 1) convert input from CPU to GPU tensors; 2) padding images; 3) normalize images; 4) voxelization.
Return
loss_dict
during training while returnlist[data_sample]
during inferenceSimply function interfaces in the models
Add
preprocess_cfg
in the model configs for pre-processing
Visualizer
Design a unified visualizer,
Det3DLocalVisualizer
, based on MMEngine for different 3D tasks and settingsSupport browsing dataset and visualization hooks based on the
Det3DLocalVisualizer
Evaluator
Decouple evaluators from datasets to make them more flexible: the evaluation codes of each dataset are implemented as a metric class exclusively.
Add evaluator information to the current dataset configs
Registry
Refactor all the registries to inherit from root registries in MMEngine
When using modules from other codebases, it is necessary to specify the registry scope, such as
mmdet.ResNet
Others: Refactor logging, hooks, scheduler, runner and other runtime configs based on MMEngine
v1.0.0rc1¶
Operators Migration¶
We have adopted CUDA operators compiled from mmcv and removed all the CUDA operators in mmdet3d. We now do not need to compile the CUDA operators in mmdet3d anymore.
Waymo dataset converter refactoring¶
In this version we did a major code refactoring that boosted the performance of waymo dataset conversion by multiprocessing. Meanwhile, we also fixed the imprecise timestamps saving issue in waymo dataset conversion. This change introduces following backward compatibility breaks:
The point cloud .bin files of waymo dataset need to be regenerated. In the .bin files each point occupies 6
float32
and the meaning of the lastfloat32
now changed from imprecise timestamps to range frame offset. The range frame offset for each point is calculated asri * h * w + row * w + col
if the point is from the TOP lidar or-1
otherwise. Theh
,w
denote the height and width of the TOP lidar’s range frame. Theri
,row
,col
denote the return index, the row and the column of the range frame where each point locates. Following tables show the difference across the change:
Before
Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
Meaning | x | y | z | intensity | elongation | imprecise timestamp |
After
Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
Meaning | x | y | z | intensity | elongation | range frame offset |
The objects’ point cloud .bin files in the GT-database of waymo dataset need to be regenerated because we also dumped the range frame offset for each point into it. Following tables show the difference across the change:
Before
Element offset (float32) | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Bytes offset | 0 | 4 | 8 | 12 | 16 |
Meaning | x | y | z | intensity | elongation |
After
Element offset (float32) | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Bytes offset | 0 | 4 | 8 | 12 | 16 | 20 |
Meaning | x | y | z | intensity | elongation | range frame offset |
Any configuration that uses waymo dataset with GT Augmentation should change the
db_sampler.points_loader.load_dim
from5
to6
.
v1.0.0rc0¶
Coordinate system refactoring¶
In this version, we did a major code refactoring which improved the consistency among the three coordinate systems (and corresponding box representation), LiDAR, Camera, and Depth. A brief summary for this refactoring is as follows:
The three coordinate systems are all right-handed now (which means the yaw angle increases in the counterclockwise direction).
The LiDAR system
(x_size, y_size, z_size)
corresponds to(l, w, h)
instead of(w, l, h)
. This is more natural sincel
is parallel with the direction where the yaw angle is zero, and we prefer using the positive direction of thex
axis as that direction, which is exactly how we define yaw angle in Depth and Camera coordinate systems.The APIs for box-related operations are improved and now are more user-friendly.
NOTICE!!¶
Since definitions of box representation have changed, the annotation data of most datasets require updating:
SUN RGB-D: Yaw angles in the annotation should be reversed.
KITTI: For LiDAR boxes in GT databases, (x_size, y_size, z_size, yaw) out of (x, y, z, x_size, y_size, z_size) should be converted from the old LiDAR coordinate system to the new one. The training/validation data annotations should be left unchanged since they are under the Camera coordinate system, which is unmodified after the refactoring.
Waymo: Same as KITTI.
nuScenes: For LiDAR boxes in training/validation data and GT databases, (x_size, y_size, z_size, yaw) out of (x, y, z, x_size, y_size, z_size) should be converted.
Lyft: Same as nuScenes.
Please regenerate the data annotation/GT database files or use update_data_coords.py
to update the data.
To use boxes under Depth and LiDAR coordinate systems, or to convert boxes between different coordinate systems, users should be aware of the difference between the old and new definitions. For example, the rotation, flipping, and bev functions of DepthInstance3DBoxes
and LiDARInstance3DBoxes
and box conversion functions have all been reimplemented in the refactoring.
Consequently, functions like output_to_lyft_box
undergo small modification to adapt to the new LiDAR/Depth box.
Since the LiDAR system (x_size, y_size, z_size)
now corresponds to (l, w, h)
instead of (w, l, h)
, the anchor sizes for LiDAR boxes are also changed, e.g., from [1.6, 3.9, 1.56]
to [3.9, 1.6, 1.56]
.
Functions only involving points are generally unaffected except if they rely on some refactored utility functions such as rotation_3d_in_axis
.
Other BC-breaking or new features:¶
array_converter
: Please refer to array_converter.py. Functions wrapped witharray_converter
can convert array-like input types oftorch.Tensor
,np.ndarray
, andlist/tuple/float
totorch.Tensor
to process in an unified PyTorch pipeline. The result may finally be converted back to the input type. Most functions in utils.py are wrapped witharray_converter
.points_in_boxes
andpoints_in_boxes_batch
will be deprecated soon. They are renamed topoints_in_boxes_part
andpoints_in_boxes_all
respectively, with more detailed docstrings. The major difference of the two functions is that if a point is enclosed by multiple boxes,points_in_boxes_part
will only return the index of the first enclosing box whilepoints_in_boxes_all
will return all the indices of enclosing boxes.rotation_3d_in_axis
: Please refer to utils.py. Now this function supports multiple input types and more options. The function with the same name in box_np_ops.py is deleted since we do not need another function to tackle with NumPy data.rotation_2d
,points_cam2img
, andlimit_period
in box_np_ops.py are also deleted for the same reason.bev
method ofCameraInstance3DBoxes
: Changed it to be consistent with the definition of bev in Depth and LiDAR coordinate systems.Data augmentation utils in data_augment_utils.py now follow the rules of a right-handed system.
We do not need the yaw hacking in KITTI anymore after refining
get_direction_target
. Interested users may refer to PR #677 .
0.16.0¶
Returned values of QueryAndGroup
operation¶
We modified the returned grouped_xyz
value of operation QueryAndGroup
to support PAConv segmentor. Originally, the grouped_xyz
is centered by subtracting the grouping centers, which represents the relative positions of grouped points. Now, we didn’t perform such subtraction and the returned grouped_xyz
stands for the absolute coordinates of these points.
Note that, the other returned variables of QueryAndGroup
such as new_features
, unique_cnt
and grouped_idx
are not affected.
NuScenes coco-style data pre-processing¶
We remove the rotation and dimension hack in the monocular 3D detection on nuScenes. Specifically, we transform the rotation and dimension of boxes defined by nuScenes devkit to the coordinate system of our CameraInstance3DBoxes
in the pre-processing and transform them back in the post-processing. In this way, we can remove the corresponding hack used in the visualization tools. The modification also guarantees the correctness of all the operations based on our CameraInstance3DBoxes
(such as NMS and flip augmentation) when training monocular 3D detectors.
The modification only influences nuScenes coco-style json files. Please re-run the nuScenes data preparation script if necessary. See more details in the PR #744.
ScanNet dataset for ImVoxelNet¶
We adopt a new pre-processing procedure for the ScanNet dataset in order to support ImVoxelNet, which is a multi-view method requiring image data. In previous versions of MMDetection3D, ScanNet dataset was only used for point cloud based 3D detection and segmentation methods. We plan adding ImVoxelNet to our model zoo, thus updating ScanNet correspondingly by adding image-related pre-processing steps. Specifically, we made these changes:
Add script for extracting RGB data.
Update script for annotation creating.
Add instructions in the documents on preparing image data.
Please refer to the ScanNet README.md for more details.
0.15.0¶
MMCV Version¶
In order to fix the problem that the priority of EvalHook is too low, all hook priorities have been re-adjusted in 1.3.8, so MMDetection 2.14.0 needs to rely on the latest MMCV 1.3.8 version. For related information, please refer to #1120, for related issues, please refer to #5343.
Unified parameter initialization¶
To unify the parameter initialization in OpenMMLab projects, MMCV supports BaseModule
that accepts init_cfg
to allow the modules’ parameters initialized in a flexible and unified manner. Now the users need to explicitly call model.init_weights()
in the training script to initialize the model (as in here, previously this was handled by the detector. Please refer to PR #622 for details.
BackgroundPointsFilter¶
We modified the dataset augmentation function BackgroundPointsFilter
(here). In previous version of MMdetection3D, BackgroundPointsFilter
changes the gt_bboxes_3d’s bottom center to the gravity center. In MMDetection3D 0.15.0,
BackgroundPointsFilter
will not change it. Please refer to PR #609 for details.
Enhance IndoorPatchPointSample
transform¶
We enhance the pipeline function IndoorPatchPointSample
used in point cloud segmentation task by adding more choices for patch selection. Also, we plan to remove the unused parameter sample_rate
in the future. Please modify the code as well as the config files accordingly if you use this transform.
0.14.0¶
Dataset class for 3D segmentation task¶
We remove a useless parameter label_weight
from segmentation datasets including Custom3DSegDataset
, ScanNetSegDataset
and S3DISSegDataset
since this weight is utilized in the loss function of model class. Please modify the code as well as the config files accordingly if you use or inherit from these codes.
ScanNet data pre-processing¶
We adopt new pre-processing and conversion steps of ScanNet dataset. In previous versions of MMDetection3D, ScanNet dataset was only used for 3D detection task, where we trained on the training set and tested on the validation set. In MMDetection3D 0.14.0, we further support 3D segmentation task on ScanNet, which includes online benchmarking on test set. Since the alignment matrix is not provided for test set data, we abandon the alignment of points in data generation steps to support both tasks. Besides, as 3D segmentation requires per-point prediction, we also remove the down-sampling step in data generation.
In the new ScanNet processing scripts, we save the unaligned points for all the training, validation and test set. For train and val set with annotations, we also store the
axis_align_matrix
in data infos. For ground-truth bounding boxes, we store boxes in both aligned and unaligned coordinates with keygt_boxes_upright_depth
and keyunaligned_gt_boxes_upright_depth
respectively in data infos.In
ScanNetDataset
, we now load theaxis_align_matrix
as a part of data annotations. If it is not contained in old data infos, we will use identity matrix for compatibility. We also add a transform functionGlobalAlignment
in ScanNet detection data pipeline to align the points.Since the aligned boxes share the same key as in old data infos, we do not need to modify the code related to it. But do remember that they are not in the same coordinate system as the saved points.
There is an
PointSample
pipeline in the data pipelines for ScanNet detection task which down-samples points. So removing down-sampling in data generation will not affect the code.
We have trained a VoteNet model on the newly processed ScanNet dataset and get similar benchmark results. In order to prepare ScanNet data for both detection and segmentation tasks, please re-run the new pre-processing scripts following the ScanNet README.md.
0.12.0¶
SUNRGBD dataset for ImVoteNet¶
We adopt a new pre-processing procedure for the SUNRGBD dataset in order to support ImVoteNet, which is a multi-modality method requiring both image and point cloud data. In previous versions of MMDetection3D, SUNRGBD dataset was only used for point cloud based 3D detection methods. In MMDetection3D 0.12.0, we add ImVoteNet to our model zoo, thus updating SUNRGBD correspondingly by adding image-related pre-processing steps. Specifically, we made these changes:
Fix a bug in the image file path in meta data.
Convert calibration matrices from double to float to avoid type mismatch in further operations.
Add instructions in the documents on preparing image data.
Please refer to the SUNRGBD README.md for more details.
0.6.0¶
VoteNet and H3DNet model structure update¶
In MMDetection 0.6.0, we updated the model structures of VoteNet and H3DNet, therefore model checkpoints generated by MMDetection < 0.6.0 should be first converted to a format compatible with the latest structures via convert_votenet_checkpoints.py and convert_h3dnet_checkpoints.py . For more details, please refer to the VoteNet README.md and H3DNet README.md.
FAQ¶
We list some potential troubles encountered by users and developers, along with their corresponding solutions. Feel free to enrich the list if you find any frequent issues and contribute your solutions to solve them. If you have any trouble with environment configuration, model training, etc, please create an issue using the provided templates and fill in all required information in the template.
MMEngine/MMCV/MMDet/MMDet3D Installation¶
Compatibility issue between MMEngine, MMCV, MMDetection and MMDetection3D; “ConvWS is already registered in conv layer”; “AssertionError: MMCV==xxx is used but incompatible. Please install mmcv>=xxx, <=xxx.”
The required versions of MMEngine, MMCV and MMDetection for different versions of MMDetection3D are as below. Please install the correct version of MMEngine, MMCV and MMDetection to avoid installation issues.
MMDetection3D version | MMEngine version | MMCV version | MMDetection version |
---|---|---|---|
dev-1.x | mmengine>=0.7.1, \<1.0.0 | mmcv>=2.0.0rc4, \<2.1.0 | mmdet>=3.0.0, \<3.1.0 |
main | mmengine>=0.7.1, \<1.0.0 | mmcv>=2.0.0rc4, \<2.1.0 | mmdet>=3.0.0, \<3.1.0 |
v1.1.0rc3 | mmengine>=0.1.0, \<1.0.0 | mmcv>=2.0.0rc3, \<2.1.0 | mmdet>=3.0.0rc0, \<3.1.0 |
v1.1.0rc2 | mmengine>=0.1.0, \<1.0.0 | mmcv>=2.0.0rc3, \<2.1.0 | mmdet>=3.0.0rc0, \<3.1.0 |
v1.1.0rc1 | mmengine>=0.1.0, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 | mmdet>=3.0.0rc0, \<3.1.0 |
v1.1.0rc0 | mmengine>=0.1.0, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 | mmdet>=3.0.0rc0, \<3.1.0 |
Note: If you want to install mmdet3d-v1.0.0rcx, the compatible MMDetection, MMSegmentation and MMCV versions table can be found at here. Please choose the correct version of MMCV, MMDetection and MMSegmentation to avoid installation issues.
If you faced the error shown below when importing open3d:
OSError: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.27' not found
please downgrade open3d to 0.9.0.0, because the latest open3d needs the support of file ‘GLIBC_2.27’, which only exists in Ubuntu 18.04, not in Ubuntu 16.04.
If you faced the error when importing pycocotools, this is because nuscenes-devkit installs pycocotools but mmdet relies on mmpycocotools. The current workaround is as below. We will migrate to use pycocotools in the future.
pip uninstall pycocotools mmpycocotools pip install mmpycocotools
NOTE: We have migrated to use pycocotools in mmdet3d >= 0.13.0.
If you face the error shown below when importing pycocotools:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
please downgrade pycocotools to 2.0.1 because of the incompatibility between the newest pycocotools and numpy < 1.20.0. Or you can compile and install the latest pycocotools from source as below:
pip install -e "git+https://github.com/cocodataset/cocoapi#egg=pycocotools&subdirectory=PythonAPI"
or
pip install -e "git+https://github.com/ppwwyyxx/cocoapi#egg=pycocotools&subdirectory=PythonAPI"
If you face some errors about numba in cuda-9.0 environment, you should check the version of numba. In cuda-9.0 environment, the high version of numba is not supported and we suggest you could install numba==0.53.0.