Shortcuts

mmdet3d.core

anchor

bbox

evaluation

visualizer

voxel

post_processing

mmdet3d.datasets

mmdet3d.models

detectors

backbones

class mmdet3d.models.backbones.DGCNNBackbone(in_channels, num_samples=(20, 20, 20), knn_modes=('D-KNN', 'F-KNN', 'F-KNN'), radius=(None, None, None), gf_channels=((64, 64), (64, 64), (64)), fa_channels=(1024), act_cfg={'type': 'ReLU'}, init_cfg=None)[source]

Backbone network for DGCNN.

Parameters
  • in_channels (int) – Input channels of point cloud.

  • num_samples (tuple[int], optional) – The number of samples for knn or ball query in each graph feature (GF) module. Defaults to (20, 20, 20).

  • knn_modes (tuple[str], optional) – Mode of KNN of each knn module. Defaults to (‘D-KNN’, ‘F-KNN’, ‘F-KNN’).

  • radius (tuple[float], optional) – Sampling radii of each GF module. Defaults to (None, None, None).

  • gf_channels (tuple[tuple[int]], optional) – Out channels of each mlp in GF module. Defaults to ((64, 64), (64, 64), (64, )).

  • fa_channels (tuple[int], optional) – Out channels of each mlp in FA module. Defaults to (1024, ).

  • act_cfg (dict, optional) – Config of activation layer. Defaults to dict(type=’ReLU’).

  • init_cfg (dict, optional) – Initialization config. Defaults to None.

forward(points)[source]

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, in_channels).

Returns

Outputs after graph feature (GF) and

feature aggregation (FA) modules.

  • gf_points (list[torch.Tensor]): Outputs after each GF module.

  • fa_points (torch.Tensor): Outputs after FA module.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.backbones.DLANet(depth, in_channels=3, out_indices=(0, 1, 2, 3, 4, 5), frozen_stages=- 1, norm_cfg=None, conv_cfg=None, layer_with_level_root=(False, True, True, True), with_identity_root=False, pretrained=None, init_cfg=None)[source]

DLA backbone.

Parameters
  • depth (int) – Depth of DLA. Default: 34.

  • in_channels (int, optional) – Number of input image channels. Default: 3.

  • norm_cfg (dict, optional) – Dictionary to construct and config norm layer. Default: None.

  • conv_cfg (dict, optional) – Dictionary to construct and config conv layer. Default: None.

  • layer_with_level_root (list[bool], optional) – Whether to apply level_root in each DLA layer, this is only used for tree levels. Default: (False, True, True, True).

  • with_identity_root (bool, optional) – Whether to add identity in root layer. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmdet3d.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=True, with_cp=False, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions arXiv:.

Parameters
  • extra (dict) –

    Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:

    • num_modules(int): The number of HRModule in this stage.

    • num_branches(int): The number of branches in the HRModule.

    • block(str): The type of convolution block.

    • num_blocks(tuple): The number of blocks in each branch.

      The length must be equal to num_branches.

    • num_channels(tuple): The number of channels in each branch.

      The length must be equal to num_branches.

  • in_channels (int) – Number of input image channels. Default: 3.

  • conv_cfg (dict) – Dictionary to construct and config conv layer.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: True.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.

  • multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmdet.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)
forward(x)[source]

Forward function.

property norm1

the normalization layer named “norm1”

Type

nn.Module

property norm2

the normalization layer named “norm2”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode will keeping the normalization layer freezed.

class mmdet3d.models.backbones.MultiBackbone(num_streams, backbones, aggregation_mlp_channels=None, conv_cfg={'type': 'Conv1d'}, norm_cfg={'eps': 1e-05, 'momentum': 0.01, 'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, suffixes=('net0', 'net1'), init_cfg=None, pretrained=None, **kwargs)[source]

MultiBackbone with different configs.

Parameters
  • num_streams (int) – The number of backbones.

  • backbones (list or dict) – A list of backbone configs.

  • aggregation_mlp_channels (list[int]) – Specify the mlp layers for feature aggregation.

  • conv_cfg (dict) – Config dict of convolutional layers.

  • norm_cfg (dict) – Config dict of normalization layers.

  • act_cfg (dict) – Config dict of activation layers.

  • suffixes (list) – A list of suffixes to rename the return dict for each backbone.

forward(points)[source]

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs from multiple backbones.

  • fp_xyz[suffix] (list[torch.Tensor]): The coordinates of each fp features.

  • fp_features[suffix] (list[torch.Tensor]): The features from each Feature Propagate Layers.

  • fp_indices[suffix] (list[torch.Tensor]): Indices of the input points.

  • hd_feature (torch.Tensor): The aggregation feature from multiple backbones.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.backbones.NoStemRegNet(arch, init_cfg=None, **kwargs)[source]

RegNet backbone without Stem for 3D detection.

More details can be found in paper .

Parameters
  • arch (dict) – The parameter of RegNets. - w0 (int): Initial width. - wa (float): Slope of width. - wm (float): Quantization parameter to quantize the width. - depth (int): Depth of the backbone. - group_w (int): Width of group. - bot_mul (float): Bottleneck ratio, i.e. expansion of bottleneck.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • base_channels (int) – Base channels after stem layer.

  • in_channels (int) – Number of input image channels. Normally 3.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

Example

>>> from mmdet3d.models import NoStemRegNet
>>> import torch
>>> self = NoStemRegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0))
>>> self.eval()
>>> inputs = torch.rand(1, 64, 16, 16)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)
forward(x)[source]

Forward function of backbone.

Parameters

x (torch.Tensor) – Features in shape (N, C, H, W).

Returns

Multi-scale features.

Return type

tuple[torch.Tensor]

class mmdet3d.models.backbones.PointNet2SAMSG(in_channels, num_points=(2048, 1024, 512, 256), radii=((0.2, 0.4, 0.8), (0.4, 0.8, 1.6), (1.6, 3.2, 4.8)), num_samples=((32, 32, 64), (32, 32, 64), (32, 32, 32)), sa_channels=(((16, 16, 32), (16, 16, 32), (32, 32, 64)), ((64, 64, 128), (64, 64, 128), (64, 96, 128)), ((128, 128, 256), (128, 192, 256), (128, 256, 256))), aggregation_channels=(64, 128, 256), fps_mods=('D-FPS', 'FS', ('F-FPS', 'D-FPS')), fps_sample_range_lists=(- 1, - 1, (512, - 1)), dilated_group=(True, True, True), out_indices=(2), norm_cfg={'type': 'BN2d'}, sa_cfg={'normalize_xyz': False, 'pool_mod': 'max', 'type': 'PointSAModuleMSG', 'use_xyz': True}, init_cfg=None)[source]

PointNet2 with Multi-scale grouping.

Parameters
  • in_channels (int) – Input channels of point cloud.

  • num_points (tuple[int]) – The number of points which each SA module samples.

  • radii (tuple[float]) – Sampling radii of each SA module.

  • num_samples (tuple[int]) – The number of samples for ball query in each SA module.

  • sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.

  • aggregation_channels (tuple[int]) – Out channels of aggregation multi-scale grouping features.

  • fps_mods (tuple[int]) – Mod of FPS for each SA module.

  • fps_sample_range_lists (tuple[tuple[int]]) – The number of sampling points which each SA module samples.

  • dilated_group (tuple[bool]) – Whether to use dilated ball query for

  • out_indices (Sequence[int]) – Output from which stages.

  • norm_cfg (dict) – Config of normalization layer.

  • sa_cfg (dict) –

    Config of set abstraction module, which may contain the following keys and values:

    • pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.

    • use_xyz (bool): Whether to use xyz as a part of features.

    • normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.

forward(points)[source]

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs of the last SA module.

  • sa_xyz (torch.Tensor): The coordinates of sa features.

  • sa_features (torch.Tensor): The features from the

    last Set Aggregation Layers.

  • sa_indices (torch.Tensor): Indices of the

    input points.

Return type

dict[str, torch.Tensor]

class mmdet3d.models.backbones.PointNet2SASSG(in_channels, num_points=(2048, 1024, 512, 256), radius=(0.2, 0.4, 0.8, 1.2), num_samples=(64, 32, 16, 16), sa_channels=((64, 64, 128), (128, 128, 256), (128, 128, 256), (128, 128, 256)), fp_channels=((256, 256), (256, 256)), norm_cfg={'type': 'BN2d'}, sa_cfg={'normalize_xyz': True, 'pool_mod': 'max', 'type': 'PointSAModule', 'use_xyz': True}, init_cfg=None)[source]

PointNet2 with Single-scale grouping.

Parameters
  • in_channels (int) – Input channels of point cloud.

  • num_points (tuple[int]) – The number of points which each SA module samples.

  • radius (tuple[float]) – Sampling radii of each SA module.

  • num_samples (tuple[int]) – The number of samples for ball query in each SA module.

  • sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.

  • fp_channels (tuple[tuple[int]]) – Out channels of each mlp in FP module.

  • norm_cfg (dict) – Config of normalization layer.

  • sa_cfg (dict) –

    Config of set abstraction module, which may contain the following keys and values:

    • pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.

    • use_xyz (bool): Whether to use xyz as a part of features.

    • normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.

forward(points)[source]

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs after SA and FP modules.

  • fp_xyz (list[torch.Tensor]): The coordinates of

    each fp features.

  • fp_features (list[torch.Tensor]): The features

    from each Feature Propagate Layers.

  • fp_indices (list[torch.Tensor]): Indices of the

    input points.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]

ResNeXt backbone.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Resnet stages. Default: 4.

  • groups (int) – Group of resnext.

  • base_width (int) – Base width of resnext.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer

class mmdet3d.models.backbones.ResNet(depth, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]

ResNet backbone.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • stem_channels (int | None) – Number of stem channels. If not specified, it will be the same as base_channels. Default: None.

  • base_channels (int) – Number of base channels of res layer. Default: 64.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Resnet stages. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Example

>>> from mmdet.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[source]

Forward function.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

make_stage_plugins(plugins, stage_idx)[source]

Make plugins for ResNet stage_idx th stage.

Currently we support to insert context_block, empirical_attention_block, nonlocal_block into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.

An example of plugins format could be:

Examples

>>> plugins=[
...     dict(cfg=dict(type='xxx', arg1='xxx'),
...          stages=(False, True, True, True),
...          position='after_conv2'),
...     dict(cfg=dict(type='yyy'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='1'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='2'),
...          stages=(True, True, True, True),
...          position='after_conv3')
... ]
>>> self = ResNet(depth=18)
>>> stage_plugins = self.make_stage_plugins(plugins, 0)
>>> assert len(stage_plugins) == 3

Suppose stage_idx=0, the structure of blocks in the stage would be:

conv1-> conv2->conv3->yyy->zzz1->zzz2

Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:

conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

If stages is missing, the plugin would be applied to all stages.

Parameters
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build

Returns

Plugins for current stage

Return type

list[dict]

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode while keep normalization layer freezed.

class mmdet3d.models.backbones.ResNetV1d(**kwargs)[source]

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmdet3d.models.backbones.SECOND(in_channels=128, out_channels=[128, 128, 256], layer_nums=[3, 5, 5], layer_strides=[2, 2, 2], norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, conv_cfg={'bias': False, 'type': 'Conv2d'}, init_cfg=None, pretrained=None)[source]

Backbone network for SECOND/PointPillars/PartA2/MVXNet.

Parameters
  • in_channels (int) – Input channels.

  • out_channels (list[int]) – Output channels for multi-scale feature maps.

  • layer_nums (list[int]) – Number of layers in each stage.

  • layer_strides (list[int]) – Strides of each stage.

  • norm_cfg (dict) – Config dict of normalization layers.

  • conv_cfg (dict) – Config dict of convolutional layers.

forward(x)[source]

Forward function.

Parameters

x (torch.Tensor) – Input with shape (N, C, H, W).

Returns

Multi-scale features.

Return type

tuple[torch.Tensor]

class mmdet3d.models.backbones.SSDVGG(depth, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), pretrained=None, init_cfg=None, input_size=None, l2_norm_scale=None)[source]

VGG Backbone network for single-shot-detection.

Parameters
  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.

  • with_last_pool (bool) – Whether to add a pooling layer at the last of the model

  • ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.

  • out_indices (Sequence[int]) – Output from which stages.

  • out_feature_indices (Sequence[int]) – Output from which feature map.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

  • input_size (int, optional) – Deprecated argumment. Width and height of input, from {300, 512}.

  • l2_norm_scale (float, optional) – Deprecated argumment. L2 normalization layer init scale.

Example

>>> self = SSDVGG(input_size=300, depth=11)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 300, 300)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 19, 19)
(1, 512, 10, 10)
(1, 256, 5, 5)
(1, 256, 3, 3)
(1, 256, 1, 1)
forward(x)[source]

Forward function.

init_weights(pretrained=None)[source]

Initialize the weights.

necks

dense_heads

roi_heads

fusion_layers

losses

middle_encoders

model_utils

Read the Docs v: v1.0.0rc1
Versions
latest
stable
v1.0.0rc1
v1.0.0rc0
v0.18.1
v0.18.0
v0.17.3
v0.17.2
v0.17.1
v0.17.0
v0.16.0
v0.15.0
v0.14.0
v0.13.0
v0.12.0
v0.11.0
v0.10.0
v0.9.0
dev
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.