mmdet3d.core¶

anchor¶

bbox¶

evaluation¶

visualizer¶

voxel¶

post_processing¶

mmdet3d.datasets¶

mmdet3d.models¶

detectors¶

backbones¶

class mmdet3d.models.backbones.DGCNNBackbone(in_channels, num_samples=(20, 20, 20), knn_modes=('D-KNN', 'F-KNN', 'F-KNN'), radius=(None, None, None), gf_channels=((64, 64), (64, 64), (64)), fa_channels=(1024), act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶

Backbone network for DGCNN.

Parameters

in_channels (int) – Input channels of point cloud.
num_samples (tuple[int], optional) – The number of samples for knn or ball query in each graph feature (GF) module. Defaults to (20, 20, 20).
knn_modes (tuple[str], optional) – Mode of KNN of each knn module. Defaults to (‘D-KNN’, ‘F-KNN’, ‘F-KNN’).
radius (tuple[float], optional) – Sampling radii of each GF module. Defaults to (None, None, None).
gf_channels (tuple[tuple[int]], optional) – Out channels of each mlp in GF module. Defaults to ((64, 64), (64, 64), (64, )).
fa_channels (tuple[int], optional) – Out channels of each mlp in FA module. Defaults to (1024, ).
act_cfg (dict, optional) – Config of activation layer. Defaults to dict(type=’ReLU’).
init_cfg (dict, optional) – Initialization config. Defaults to None.

forward(points)[source]¶

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, in_channels).

Returns

Outputs after graph feature (GF) and

feature aggregation (FA) modules.

gf_points (list[torch.Tensor]): Outputs after each GF module.
fa_points (torch.Tensor): Outputs after FA module.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.backbones.DLANet(depth, in_channels=3, out_indices=(0, 1, 2, 3, 4, 5), frozen_stages=- 1, norm_cfg=None, conv_cfg=None, layer_with_level_root=(False, True, True, True), with_identity_root=False, pretrained=None, init_cfg=None)[source]¶

DLA backbone.

Parameters

depth (int) – Depth of DLA. Default: 34.
in_channels (int, optional) – Number of input image channels. Default: 3.
norm_cfg (dict, optional) – Dictionary to construct and config norm layer. Default: None.
conv_cfg (dict, optional) – Dictionary to construct and config conv layer. Default: None.
layer_with_level_root (list[bool], optional) – Whether to apply level_root in each DLA layer, this is only used for tree levels. Default: (False, True, True, True).
with_identity_root (bool, optional) – Whether to add identity in root layer. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmdet3d.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=True, with_cp=False, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]¶

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions arXiv:.

Parameters

extra (dict) –
Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:
- num_modules(int): The number of HRModule in this stage.
- num_branches(int): The number of branches in the HRModule.
- block(str): The type of convolution block.
- num_blocks(tuple): The number of blocks in each branch.
  The length must be equal to num_branches.
- num_channels(tuple): The number of channels in each branch.
  The length must be equal to num_branches.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – Dictionary to construct and config conv layer.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: True.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.
multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.
pretrained (str, optional) – Model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmdet.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)

forward(x)[source]¶: Forward function.

property norm1¶

the normalization layer named “norm1”

Type: nn.Module

property norm2¶

the normalization layer named “norm2”

Type: nn.Module

train(mode=True)[source]¶: Convert the model into training mode will keeping the normalization layer freezed.

class mmdet3d.models.backbones.MultiBackbone(num_streams, backbones, aggregation_mlp_channels=None, conv_cfg={'type': 'Conv1d'}, norm_cfg={'eps': 1e-05, 'momentum': 0.01, 'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, suffixes=('net0', 'net1'), init_cfg=None, pretrained=None, **kwargs)[source]¶

MultiBackbone with different configs.

Parameters

num_streams (int) – The number of backbones.
backbones (list or dict) – A list of backbone configs.
aggregation_mlp_channels (list[int]) – Specify the mlp layers for feature aggregation.
conv_cfg (dict) – Config dict of convolutional layers.
norm_cfg (dict) – Config dict of normalization layers.
act_cfg (dict) – Config dict of activation layers.
suffixes (list) – A list of suffixes to rename the return dict for each backbone.

forward(points)[source]¶

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs from multiple backbones.

fp_xyz[suffix] (list[torch.Tensor]): The coordinates of each fp features.

fp_features[suffix] (list[torch.Tensor]): The features from each Feature Propagate Layers.

fp_indices[suffix] (list[torch.Tensor]): Indices of the input points.

hd_feature (torch.Tensor): The aggregation feature from multiple backbones.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.backbones.NoStemRegNet(arch, init_cfg=None, **kwargs)[source]¶

RegNet backbone without Stem for 3D detection.

More details can be found in paper .

Parameters

arch (dict) – The parameter of RegNets. - w0 (int): Initial width. - wa (float): Slope of width. - wm (float): Quantization parameter to quantize the width. - depth (int): Depth of the backbone. - group_w (int): Width of group. - bot_mul (float): Bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Normally 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

Example

>>> from mmdet3d.models import NoStemRegNet
>>> import torch
>>> self = NoStemRegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0))
>>> self.eval()
>>> inputs = torch.rand(1, 64, 16, 16)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)

forward(x)[source]¶

Forward function of backbone.

Parameters: x (torch.Tensor) – Features in shape (N, C, H, W).
Returns: Multi-scale features.
Return type: tuple[torch.Tensor]

class mmdet3d.models.backbones.PointNet2SAMSG(in_channels, num_points=(2048, 1024, 512, 256), radii=((0.2, 0.4, 0.8), (0.4, 0.8, 1.6), (1.6, 3.2, 4.8)), num_samples=((32, 32, 64), (32, 32, 64), (32, 32, 32)), sa_channels=(((16, 16, 32), (16, 16, 32), (32, 32, 64)), ((64, 64, 128), (64, 64, 128), (64, 96, 128)), ((128, 128, 256), (128, 192, 256), (128, 256, 256))), aggregation_channels=(64, 128, 256), fps_mods=('D-FPS', 'FS', ('F-FPS', 'D-FPS')), fps_sample_range_lists=(- 1, - 1, (512, - 1)), dilated_group=(True, True, True), out_indices=(2), norm_cfg={'type': 'BN2d'}, sa_cfg={'normalize_xyz': False, 'pool_mod': 'max', 'type': 'PointSAModuleMSG', 'use_xyz': True}, init_cfg=None)[source]¶

PointNet2 with Multi-scale grouping.

Parameters

in_channels (int) – Input channels of point cloud.
num_points (tuple[int]) – The number of points which each SA module samples.
radii (tuple[float]) – Sampling radii of each SA module.
num_samples (tuple[int]) – The number of samples for ball query in each SA module.
sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.
aggregation_channels (tuple[int]) – Out channels of aggregation multi-scale grouping features.
fps_mods (tuple[int]) – Mod of FPS for each SA module.
fps_sample_range_lists (tuple[tuple[int]]) – The number of sampling points which each SA module samples.
dilated_group (tuple[bool]) – Whether to use dilated ball query for
out_indices (Sequence[int]) – Output from which stages.
norm_cfg (dict) – Config of normalization layer.
sa_cfg (dict) –
Config of set abstraction module, which may contain the following keys and values:
- pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.
- use_xyz (bool): Whether to use xyz as a part of features.
- normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.

forward(points)[source]¶

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs of the last SA module.

sa_xyz (torch.Tensor): The coordinates of sa features.

sa_features (torch.Tensor): The features from the
last Set Aggregation Layers.

sa_indices (torch.Tensor): Indices of the
input points.

Return type

dict[str, torch.Tensor]

class mmdet3d.models.backbones.PointNet2SASSG(in_channels, num_points=(2048, 1024, 512, 256), radius=(0.2, 0.4, 0.8, 1.2), num_samples=(64, 32, 16, 16), sa_channels=((64, 64, 128), (128, 128, 256), (128, 128, 256), (128, 128, 256)), fp_channels=((256, 256), (256, 256)), norm_cfg={'type': 'BN2d'}, sa_cfg={'normalize_xyz': True, 'pool_mod': 'max', 'type': 'PointSAModule', 'use_xyz': True}, init_cfg=None)[source]¶

PointNet2 with Single-scale grouping.

Parameters

in_channels (int) – Input channels of point cloud.
num_points (tuple[int]) – The number of points which each SA module samples.
radius (tuple[float]) – Sampling radii of each SA module.
num_samples (tuple[int]) – The number of samples for ball query in each SA module.
sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.
fp_channels (tuple[tuple[int]]) – Out channels of each mlp in FP module.
norm_cfg (dict) – Config of normalization layer.
sa_cfg (dict) –
Config of set abstraction module, which may contain the following keys and values:
- pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.
- use_xyz (bool): Whether to use xyz as a part of features.
- normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.

forward(points)[source]¶

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs after SA and FP modules.

fp_xyz (list[torch.Tensor]): The coordinates of
each fp features.

fp_features (list[torch.Tensor]): The features
from each Feature Propagate Layers.

fp_indices (list[torch.Tensor]): Indices of the
input points.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]¶

ResNeXt backbone.

Parameters

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
groups (int) – Group of resnext.
base_width (int) – Base width of resnext.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

make_res_layer(**kwargs)[source]¶: Pack all blocks in a stage into a ResLayer

class mmdet3d.models.backbones.ResNet(depth, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]¶

ResNet backbone.

Parameters

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
stem_channels (int | None) – Number of stem channels. If not specified, it will be the same as base_channels. Default: None.
base_channels (int) – Number of base channels of res layer. Default: 64.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
- cfg (dict, required): Cfg dict to build plugin.
- position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.
- stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Example

>>> from mmdet.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)

forward(x)[source]¶: Forward function.

make_res_layer(**kwargs)[source]¶: Pack all blocks in a stage into a ResLayer.

make_stage_plugins(plugins, stage_idx)[source]¶

Make plugins for ResNet stage_idx th stage.

Currently we support to insert context_block, empirical_attention_block, nonlocal_block into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.

An example of plugins format could be:

Examples

>>> plugins=[
...     dict(cfg=dict(type='xxx', arg1='xxx'),
...          stages=(False, True, True, True),
...          position='after_conv2'),
...     dict(cfg=dict(type='yyy'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='1'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='2'),
...          stages=(True, True, True, True),
...          position='after_conv3')
... ]
>>> self = ResNet(depth=18)
>>> stage_plugins = self.make_stage_plugins(plugins, 0)
>>> assert len(stage_plugins) == 3

Suppose stage_idx=0, the structure of blocks in the stage would be:

conv1-> conv2->conv3->yyy->zzz1->zzz2

Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:

conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

If stages is missing, the plugin would be applied to all stages.

Parameters

plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.
stage_idx (int) – Index of stage to build

Returns

Plugins for current stage

Return type

list[dict]

property norm1¶

the normalization layer named “norm1”

Type: nn.Module

train(mode=True)[source]¶: Convert the model into training mode while keep normalization layer freezed.

class mmdet3d.models.backbones.ResNetV1d(**kwargs)[source]¶

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmdet3d.models.backbones.SECOND(in_channels=128, out_channels=[128, 128, 256], layer_nums=[3, 5, 5], layer_strides=[2, 2, 2], norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, conv_cfg={'bias': False, 'type': 'Conv2d'}, init_cfg=None, pretrained=None)[source]¶

Backbone network for SECOND/PointPillars/PartA2/MVXNet.

Parameters

in_channels (int) – Input channels.
out_channels (list[int]) – Output channels for multi-scale feature maps.
layer_nums (list[int]) – Number of layers in each stage.
layer_strides (list[int]) – Strides of each stage.
norm_cfg (dict) – Config dict of normalization layers.
conv_cfg (dict) – Config dict of convolutional layers.

forward(x)[source]¶

Forward function.

Parameters: x (torch.Tensor) – Input with shape (N, C, H, W).
Returns: Multi-scale features.
Return type: tuple[torch.Tensor]

class mmdet3d.models.backbones.SSDVGG(depth, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), pretrained=None, init_cfg=None, input_size=None, l2_norm_scale=None)[source]¶

VGG Backbone network for single-shot-detection.

Parameters

depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_last_pool (bool) – Whether to add a pooling layer at the last of the model
ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.
out_indices (Sequence[int]) – Output from which stages.
out_feature_indices (Sequence[int]) – Output from which feature map.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
input_size (int, optional) – Deprecated argumment. Width and height of input, from {300, 512}.
l2_norm_scale (float, optional) – Deprecated argumment. L2 normalization layer init scale.

Example

>>> self = SSDVGG(input_size=300, depth=11)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 300, 300)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 19, 19)
(1, 512, 10, 10)
(1, 256, 5, 5)
(1, 256, 3, 3)
(1, 256, 1, 1)

mmdet3d.core¶

anchor¶

bbox¶

evaluation¶

visualizer¶

voxel¶

post_processing¶

mmdet3d.datasets¶

mmdet3d.models¶

detectors¶

backbones¶

necks¶

dense_heads¶

roi_heads¶

fusion_layers¶

losses¶

middle_encoders¶

model_utils¶