Shortcuts

mmdet3d.core

anchor

class mmdet3d.core.anchor.AlignedAnchor3DRangeGenerator(align_corner=False, **kwargs)[source]

Aligned 3D Anchor Generator by range.

This anchor generator uses a different manner to generate the positions of anchors’ centers from Anchor3DRangeGenerator.

Note

The align means that the anchor’s center is aligned with the voxel grid, which is also the feature grid. The previous implementation of Anchor3DRangeGenerator does not generate the anchors’ center according to the voxel grid. Rather, it generates the center by uniformly distributing the anchors inside the minimum and maximum anchor ranges according to the feature map sizes. However, this makes the anchors center does not match the feature grid. The AlignedAnchor3DRangeGenerator add + 1 when using the feature map sizes to obtain the corners of the voxel grid. Then it shifts the coordinates to the center of voxel grid and use the left up corner to distribute anchors.

Parameters

anchor_corner (bool) – Whether to align with the corner of the voxel grid. By default it is False and the anchor’s center will be the same as the corresponding voxel’s center, which is also the center of the corresponding greature grid.

anchors_single_range(feature_size, anchor_range, scale, sizes=[[1.6, 3.9, 1.56]], rotations=[0, 1.5707963], device='cuda')[source]

Generate anchors in a single range.

Parameters
  • feature_size (list[float] | tuple[float]) – Feature map size. It is either a list of a tuple of [D, H, W](in order of z, y, and x).

  • anchor_range (torch.Tensor | list[float]) – Range of anchors with shape [6]. The order is consistent with that of anchors, i.e., (x_min, y_min, z_min, x_max, y_max, z_max).

  • scale (float | int, optional) – The scale factor of anchors.

  • sizes (list[list] | np.ndarray | torch.Tensor) – Anchor size with shape [N, 3], in order of x, y, z.

  • rotations (list[float] | np.ndarray | torch.Tensor) – Rotations of anchors in a single feature grid.

  • device (str) – Devices that the anchors will be put on.

Returns

Anchors with shape [*feature_size, num_sizes, num_rots, 7].

Return type

torch.Tensor

class mmdet3d.core.anchor.AlignedAnchor3DRangeGeneratorPerCls(**kwargs)[source]

3D Anchor Generator by range for per class.

This anchor generator generates anchors by the given range for per class. Note that feature maps of different classes may be different.

Parameters

kwargs (dict) – Arguments are the same as those in AlignedAnchor3DRangeGenerator.

grid_anchors(featmap_sizes, device='cuda')[source]

Generate grid anchors in multiple feature levels.

Parameters
  • featmap_sizes (list[tuple]) – List of feature map sizes for different classes in a single feature level.

  • device (str) – Device where the anchors will be put on.

Returns

Anchors in multiple feature levels. Note that in this anchor generator, we currently only support single feature level. The sizes of each tensor should be [num_sizes/ranges*num_rots*featmap_size, box_code_size].

Return type

list[list[torch.Tensor]]

multi_cls_grid_anchors(featmap_sizes, scale, device='cuda')[source]

Generate grid anchors of a single level feature map for multi-class with different feature map sizes.

This function is usually called by method self.grid_anchors.

Parameters
  • featmap_sizes (list[tuple]) – List of feature map sizes for different classes in a single feature level.

  • scale (float) – Scale factor of the anchors in the current level.

  • device (str, optional) – Device the tensor will be put on. Defaults to ‘cuda’.

Returns

Anchors in the overall feature map.

Return type

torch.Tensor

class mmdet3d.core.anchor.Anchor3DRangeGenerator(ranges, sizes=[[1.6, 3.9, 1.56]], scales=[1], rotations=[0, 1.5707963], custom_values=(), reshape_out=True, size_per_range=True)[source]

3D Anchor Generator by range.

This anchor generator generates anchors by the given range in different feature levels. Due the convention in 3D detection, different anchor sizes are related to different ranges for different categories. However we find this setting does not effect the performance much in some datasets, e.g., nuScenes.

Parameters
  • ranges (list[list[float]]) – Ranges of different anchors. The ranges are the same across different feature levels. But may vary for different anchor sizes if size_per_range is True.

  • sizes (list[list[float]]) – 3D sizes of anchors.

  • scales (list[int]) – Scales of anchors in different feature levels.

  • rotations (list[float]) – Rotations of anchors in a feature grid.

  • custom_values (tuple[float]) – Customized values of that anchor. For example, in nuScenes the anchors have velocities.

  • reshape_out (bool) – Whether to reshape the output into (N x 4).

  • size_per_range – Whether to use separate ranges for different sizes. If size_per_range is True, the ranges should have the same length as the sizes, if not, it will be duplicated.

anchors_single_range(feature_size, anchor_range, scale=1, sizes=[[1.6, 3.9, 1.56]], rotations=[0, 1.5707963], device='cuda')[source]

Generate anchors in a single range.

Parameters
  • feature_size (list[float] | tuple[float]) – Feature map size. It is either a list of a tuple of [D, H, W](in order of z, y, and x).

  • anchor_range (torch.Tensor | list[float]) – Range of anchors with shape [6]. The order is consistent with that of anchors, i.e., (x_min, y_min, z_min, x_max, y_max, z_max).

  • scale (float | int, optional) – The scale factor of anchors.

  • sizes (list[list] | np.ndarray | torch.Tensor) – Anchor size with shape [N, 3], in order of x, y, z.

  • rotations (list[float] | np.ndarray | torch.Tensor) – Rotations of anchors in a single feature grid.

  • device (str) – Devices that the anchors will be put on.

Returns

Anchors with shape [*feature_size, num_sizes, num_rots, 7].

Return type

torch.Tensor

grid_anchors(featmap_sizes, device='cuda')[source]

Generate grid anchors in multiple feature levels.

Parameters
  • featmap_sizes (list[tuple]) – List of feature map sizes in multiple feature levels.

  • device (str) – Device where the anchors will be put on.

Returns

Anchors in multiple feature levels. The sizes of each tensor should be [N, 4], where N = width * height * num_base_anchors, width and height are the sizes of the corresponding feature lavel, num_base_anchors is the number of anchors for that level.

Return type

list[torch.Tensor]

property num_base_anchors

Total number of base anchors in a feature grid.

Type

list[int]

property num_levels

Number of feature levels that the generator is applied to.

Type

int

single_level_grid_anchors(featmap_size, scale, device='cuda')[source]

Generate grid anchors of a single level feature map.

This function is usually called by method self.grid_anchors.

Parameters
  • featmap_size (tuple[int]) – Size of the feature map.

  • scale (float) – Scale factor of the anchors in the current level.

  • device (str, optional) – Device the tensor will be put on. Defaults to ‘cuda’.

Returns

Anchors in the overall feature map.

Return type

torch.Tensor

bbox

class mmdet3d.core.bbox.AssignResult(num_gts, gt_inds, max_overlaps, labels=None)[source]

Stores assignments between predicted and truth boxes.

num_gts

the number of truth boxes considered when computing this assignment

Type

int

gt_inds

for each predicted box indicates the 1-based index of the assigned truth box. 0 means unassigned and -1 means ignore.

Type

LongTensor

max_overlaps

the iou between the predicted box and its assigned truth box.

Type

FloatTensor

labels

If specified, for each predicted box indicates the category label of the assigned truth box.

Type

None | LongTensor

Example

>>> # An assign result between 4 predicted boxes and 9 true boxes
>>> # where only two boxes were assigned.
>>> num_gts = 9
>>> max_overlaps = torch.LongTensor([0, .5, .9, 0])
>>> gt_inds = torch.LongTensor([-1, 1, 2, 0])
>>> labels = torch.LongTensor([0, 3, 4, 0])
>>> self = AssignResult(num_gts, gt_inds, max_overlaps, labels)
>>> print(str(self))  # xdoctest: +IGNORE_WANT
<AssignResult(num_gts=9, gt_inds.shape=(4,), max_overlaps.shape=(4,),
              labels.shape=(4,))>
>>> # Force addition of gt labels (when adding gt as proposals)
>>> new_labels = torch.LongTensor([3, 4, 5])
>>> self.add_gt_(new_labels)
>>> print(str(self))  # xdoctest: +IGNORE_WANT
<AssignResult(num_gts=9, gt_inds.shape=(7,), max_overlaps.shape=(7,),
              labels.shape=(7,))>
add_gt_(gt_labels)[source]

Add ground truth as assigned results.

Parameters

gt_labels (torch.Tensor) – Labels of gt boxes

get_extra_property(key)[source]

Get user-defined property.

property info

a dictionary of info about the object

Type

dict

property num_preds

the number of predictions in this assignment

Type

int

classmethod random(**kwargs)[source]

Create random AssignResult for tests or debugging.

Parameters
  • num_preds – number of predicted boxes

  • num_gts – number of true boxes

  • p_ignore (float) – probability of a predicted box assigned to an ignored truth

  • p_assigned (float) – probability of a predicted box not being assigned

  • p_use_label (float | bool) – with labels or not

  • rng (None | int | numpy.random.RandomState) – seed or state

Returns

Randomly generated assign results.

Return type

AssignResult

Example

>>> from mmdet.core.bbox.assigners.assign_result import *  # NOQA
>>> self = AssignResult.random()
>>> print(self.info)
set_extra_property(key, value)[source]

Set user-defined new property.

class mmdet3d.core.bbox.AxisAlignedBboxOverlaps3D[source]

Axis-aligned 3D Overlaps (IoU) Calculator.

class mmdet3d.core.bbox.BaseAssigner[source]

Base assigner that assigns boxes to ground truth boxes.

abstract assign(bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None)[source]

Assign boxes to either a ground truth boxes or a negative boxes.

class mmdet3d.core.bbox.BaseInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]

Base class for 3D Boxes.

Note

The box is bottom centered, i.e. the relative position of origin in the box is (0.5, 0.5, 0).

Parameters
  • tensor (torch.Tensor | np.ndarray | list) – a N x box_dim matrix.

  • box_dim (int) – Number of the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw). Default to 7.

  • with_yaw (bool) – Whether the box is with yaw rotation. If False, the value of yaw will be set to 0 as minmax boxes. Default to True.

  • origin (tuple[float]) – The relative position of origin in the box. Default to (0.5, 0.5, 0). This will guide the box be converted to (0.5, 0.5, 0) mode.

tensor

Float matrix of N x box_dim.

Type

torch.Tensor

box_dim

Integer indicating the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw, …).

Type

int

with_yaw

If True, the value of yaw will be set to 0 as minmax boxes.

Type

bool

property bottom_center

A tensor with center of each box.

Type

torch.Tensor

property bottom_height

A vector with bottom’s height of each box.

Type

torch.Tensor

classmethod cat(boxes_list)[source]

Concatenate a list of Boxes into a single Boxes.

Parameters

boxes_list (list[BaseInstance3DBoxes]) – List of boxes.

Returns

The concatenated Boxes.

Return type

BaseInstance3DBoxes

property center

Calculate the center of all the boxes.

Note

In the MMDetection3D’s convention, the bottom center is usually taken as the default center.

The relative position of the centers in different kinds of boxes are different, e.g., the relative center of a boxes is (0.5, 1.0, 0.5) in camera and (0.5, 0.5, 0) in lidar. It is recommended to use bottom_center or gravity_center for more clear usage.

Returns

A tensor with center of each box.

Return type

torch.Tensor

clone()[source]

Clone the Boxes.

Returns

Box object with the same properties as self.

Return type

BaseInstance3DBoxes

abstract convert_to(dst, rt_mat=None)[source]

Convert self to dst mode.

Parameters
  • dst (Box3DMode) – The target Box mode.

  • rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type in the dst mode.

Return type

BaseInstance3DBoxes

property corners

a tensor with 8 corners of each box.

Type

torch.Tensor

property device

The device of the boxes are on.

Type

str

property dims

Corners of each box with size (N, 8, 3).

Type

torch.Tensor

abstract flip(bev_direction='horizontal')[source]

Flip the boxes in BEV along given BEV direction.

property gravity_center

A tensor with center of each box.

Type

torch.Tensor

property height

A vector with height of each box.

Type

torch.Tensor

classmethod height_overlaps(boxes1, boxes2, mode='iou')[source]

Calculate height overlaps of two boxes.

Note

This function calculates the height overlaps between boxes1 and boxes2, boxes1 and boxes2 should be in the same type.

Parameters
  • boxes1 (BaseInstance3DBoxes) – Boxes 1 contain N boxes.

  • boxes2 (BaseInstance3DBoxes) – Boxes 2 contain M boxes.

  • mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.

Returns

Calculated iou of boxes.

Return type

torch.Tensor

in_range_3d(box_range)[source]

Check whether the boxes are in the given range.

Parameters

box_range (list | torch.Tensor) – The range of box (x_min, y_min, z_min, x_max, y_max, z_max)

Note

In the original implementation of SECOND, checking whether a box in the range checks whether the points are in a convex polygon, we try to reduce the burden for simpler cases.

Returns

A binary vector indicating whether each box is inside the reference range.

Return type

torch.Tensor

abstract in_range_bev(box_range)[source]

Check whether the boxes are in the given range.

Parameters

box_range (list | torch.Tensor) – The range of box in order of (x_min, y_min, x_max, y_max).

Returns

Indicating whether each box is inside the reference range.

Return type

torch.Tensor

limit_yaw(offset=0.5, period=3.141592653589793)[source]

Limit the yaw to a given period and offset.

Parameters
  • offset (float) – The offset of the yaw.

  • period (float) – The expected period.

new_box(data)[source]

Create a new box object with data.

The new box and its tensor has the similar properties as self and self.tensor, respectively.

Parameters

data (torch.Tensor | numpy.array | list) – Data to be copied.

Returns

A new bbox object with data, the object’s other properties are similar to self.

Return type

BaseInstance3DBoxes

nonempty(threshold: float = 0.0)[source]

Find boxes that are non-empty.

A box is considered empty, if either of its side is no larger than threshold.

Parameters

threshold (float) – The threshold of minimal sizes.

Returns

A binary vector which represents whether each box is empty (False) or non-empty (True).

Return type

torch.Tensor

classmethod overlaps(boxes1, boxes2, mode='iou')[source]

Calculate 3D overlaps of two boxes.

Note

This function calculates the overlaps between boxes1 and boxes2, boxes1 and boxes2 should be in the same type.

Parameters
  • boxes1 (BaseInstance3DBoxes) – Boxes 1 contain N boxes.

  • boxes2 (BaseInstance3DBoxes) – Boxes 2 contain M boxes.

  • mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.

Returns

Calculated iou of boxes’ heights.

Return type

torch.Tensor

abstract rotate(angle, points=None)[source]

Rotate boxes with points (optional) with the given angle or rotation matrix.

Parameters
  • angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.

  • points (torch.Tensor, numpy.ndarray, BasePoints, optional) – Points to rotate. Defaults to None.

scale(scale_factor)[source]

Scale the box with horizontal and vertical scaling factors.

Parameters

scale_factors (float) – Scale factors to scale the boxes.

to(device)[source]

Convert current boxes to a specific device.

Parameters

device (str | torch.device) – The name of the device.

Returns

A new boxes object on the specific device.

Return type

BaseInstance3DBoxes

property top_height

A vector with the top height of each box.

Type

torch.Tensor

translate(trans_vector)[source]

Translate boxes with the given translation vector.

Parameters

trans_vector (torch.Tensor) – Translation vector of size 1x3.

property volume

A vector with volume of each box.

Type

torch.Tensor

property yaw

A vector with yaw of each box.

Type

torch.Tensor

class mmdet3d.core.bbox.BaseSampler(num, pos_fraction, neg_pos_ub=- 1, add_gt_as_proposals=True, **kwargs)[source]

Base class of samplers.

sample(assign_result, bboxes, gt_bboxes, gt_labels=None, **kwargs)[source]

Sample positive and negative bboxes.

This is a simple implementation of bbox sampling given candidates, assigning results and ground truth bboxes.

Parameters
  • assign_result (AssignResult) – Bbox assigning results.

  • bboxes (Tensor) – Boxes to be sampled from.

  • gt_bboxes (Tensor) – Ground truth bboxes.

  • gt_labels (Tensor, optional) – Class labels of ground truth bboxes.

Returns

Sampling result.

Return type

SamplingResult

Example

>>> from mmdet.core.bbox import RandomSampler
>>> from mmdet.core.bbox import AssignResult
>>> from mmdet.core.bbox.demodata import ensure_rng, random_boxes
>>> rng = ensure_rng(None)
>>> assign_result = AssignResult.random(rng=rng)
>>> bboxes = random_boxes(assign_result.num_preds, rng=rng)
>>> gt_bboxes = random_boxes(assign_result.num_gts, rng=rng)
>>> gt_labels = None
>>> self = RandomSampler(num=32, pos_fraction=0.5, neg_pos_ub=-1,
>>>                      add_gt_as_proposals=False)
>>> self = self.sample(assign_result, bboxes, gt_bboxes, gt_labels)
class mmdet3d.core.bbox.BboxOverlaps3D(coordinate)[source]

3D IoU Calculator.

Parameters

coordinate (str) – The coordinate system, valid options are ‘camera’, ‘lidar’, and ‘depth’.

class mmdet3d.core.bbox.BboxOverlapsNearest3D(coordinate='lidar')[source]

Nearest 3D IoU Calculator.

Note

This IoU calculator first finds the nearest 2D boxes in bird eye view (BEV), and then calculates the 2D IoU using bbox_overlaps().

Parameters

coordinate (str) – ‘camera’, ‘lidar’, or ‘depth’ coordinate system.

class mmdet3d.core.bbox.Box3DMode(value)[source]

Enum of different ways to represent a box.

Coordinates in LiDAR:

            up z
               ^   x front
               |  /
               | /
left y <------ 0

The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.

Coordinates in camera:

        z front
       /
      /
     0 ------> x right
     |
     |
     v
down y

The relative coordinate of bottom center in a CAM box is [0.5, 1.0, 0.5], and the yaw is around the y axis, thus the rotation axis=1.

Coordinates in Depth mode:

up z
   ^   y front
   |  /
   | /
   0 ------> x right

The relative coordinate of bottom center in a DEPTH box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.

static convert(box, src, dst, rt_mat=None)[source]

Convert boxes from src mode to dst mode.

Parameters
  • (tuple | list | np.ndarray | (box) – torch.Tensor | BaseInstance3DBoxes): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.

  • src (Box3DMode) – The src Box mode.

  • dst (Box3DMode) – The target Box mode.

  • rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type.

Return type

(tuple | list | np.ndarray | torch.Tensor | BaseInstance3DBoxes)

class mmdet3d.core.bbox.CameraInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 1.0, 0.5))[source]

3D boxes of instances in CAM coordinates.

Coordinates in camera:

        z front (yaw=-0.5*pi)
       /
      /
     0 ------> x right (yaw=0)
     |
     |
     v
down y

The relative coordinate of bottom center in a CAM box is (0.5, 1.0, 0.5), and the yaw is around the y axis, thus the rotation axis=1. The yaw is 0 at the positive direction of x axis, and decreases from the positive direction of x to the positive direction of z.

A refactor is ongoing to make the three coordinate systems easier to understand and convert between each other.

tensor

Float matrix of N x box_dim.

Type

torch.Tensor

box_dim

Integer indicates the dimension of a box Each row is (x, y, z, x_size, y_size, z_size, yaw, …).

Type

int

with_yaw

If True, the value of yaw will be set to 0 as minmax boxes.

Type

bool

property bev

A n x 5 tensor of 2D BEV box of each box with rotation in XYWHR format.

Type

torch.Tensor

property bottom_height

A vector with bottom’s height of each box.

Type

torch.Tensor

convert_to(dst, rt_mat=None)[source]

Convert self to dst mode.

Parameters
  • dst (Box3DMode) – The target Box mode.

  • rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type in the dst mode.

Return type

BaseInstance3DBoxes

property corners
Coordinates of corners of all the boxes in

shape (N, 8, 3).

Convert the boxes to in clockwise order, in the form of (x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)

             front z
                  /
                 /
   (x0, y0, z1) + -----------  + (x1, y0, z1)
               /|            / |
              / |           /  |
(x0, y0, z0) + ----------- +   + (x1, y1, z1)
             |  /      .   |  /
             | / origin    | /
(x0, y1, z0) + ----------- + -------> x right
             |             (x1, y1, z0)
             |
             v
        down y
Type

torch.Tensor

flip(bev_direction='horizontal', points=None)[source]

Flip the boxes in BEV along given BEV direction.

In CAM coordinates, it flips the x (horizontal) or z (vertical) axis.

Parameters
  • bev_direction (str) – Flip direction (horizontal or vertical).

  • points (torch.Tensor, numpy.ndarray, BasePoints, None) – Points to flip. Defaults to None.

Returns

Flipped points.

Return type

torch.Tensor, numpy.ndarray or None

property gravity_center

A tensor with center of each box.

Type

torch.Tensor

property height

A vector with height of each box.

Type

torch.Tensor

classmethod height_overlaps(boxes1, boxes2, mode='iou')[source]

Calculate height overlaps of two boxes.

This function calculates the height overlaps between boxes1 and boxes2, where boxes1 and boxes2 should be in the same type.

Parameters
Returns

Calculated iou of boxes’ heights.

Return type

torch.Tensor

in_range_bev(box_range)[source]

Check whether the boxes are in the given range.

Parameters

box_range (list | torch.Tensor) – The range of box (x_min, z_min, x_max, z_max).

Note

The original implementation of SECOND checks whether boxes in a range by checking whether the points are in a convex polygon, we reduce the burden for simpler cases.

Returns

Indicating whether each box is inside the reference range.

Return type

torch.Tensor

property nearest_bev

A tensor of 2D BEV box of each box without rotation.

Type

torch.Tensor

rotate(angle, points=None)[source]

Rotate boxes with points (optional) with the given angle or rotation matrix.

Parameters
  • angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.

  • points (torch.Tensor, numpy.ndarray, BasePoints, optional) – Points to rotate. Defaults to None.

Returns

When points is None, the function returns None, otherwise it returns the rotated points and the rotation matrix rot_mat_T.

Return type

tuple or None

property top_height

A vector with the top height of each box.

Type

torch.Tensor

class mmdet3d.core.bbox.CombinedSampler(pos_sampler, neg_sampler, **kwargs)[source]

A sampler that combines positive sampler and negative sampler.

class mmdet3d.core.bbox.Coord3DMode(value)[source]
Enum of different ways to represent a box

and point cloud.

Coordinates in LiDAR:

            up z
               ^   x front
               |  /
               | /
left y <------ 0

The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.

Coordinates in camera:

        z front
       /
      /
     0 ------> x right
     |
     |
     v
down y

The relative coordinate of bottom center in a CAM box is [0.5, 1.0, 0.5], and the yaw is around the y axis, thus the rotation axis=1.

Coordinates in Depth mode:

up z
   ^   y front
   |  /
   | /
   0 ------> x right

The relative coordinate of bottom center in a DEPTH box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.

static convert(input, src, dst, rt_mat=None)[source]

Convert boxes or points from src mode to dst mode.

static convert_box(box, src, dst, rt_mat=None)[source]

Convert boxes from src mode to dst mode.

Parameters
  • (tuple | list | np.ndarray | (box) – torch.Tensor | BaseInstance3DBoxes): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.

  • src (CoordMode) – The src Box mode.

  • dst (CoordMode) – The target Box mode.

  • rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type.

Return type

(tuple | list | np.ndarray | torch.Tensor | BaseInstance3DBoxes)

static convert_point(point, src, dst, rt_mat=None)[source]

Convert points from src mode to dst mode.

Parameters
  • (tuple | list | np.ndarray | (point) – torch.Tensor | BasePoints): Can be a k-tuple, k-list or an Nxk array/tensor.

  • src (CoordMode) – The src Point mode.

  • dst (CoordMode) – The target Point mode.

  • rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted point of the same type.

Return type

(tuple | list | np.ndarray | torch.Tensor | BasePoints)

class mmdet3d.core.bbox.DeltaXYZWLHRBBoxCoder(code_size=7)[source]

Bbox Coder for 3D boxes.

Parameters

code_size (int) – The dimension of boxes to be encoded.

static decode(anchors, deltas)[source]

Apply transformation deltas (dx, dy, dz, dw, dh, dl, dr, dv*) to boxes.

Parameters
  • anchors (torch.Tensor) – Parameters of anchors with shape (N, 7).

  • deltas (torch.Tensor) – Encoded boxes with shape (N, 7+n) [x, y, z, w, l, h, r, velo*].

Returns

Decoded boxes.

Return type

torch.Tensor

static encode(src_boxes, dst_boxes)[source]

Get box regression transformation deltas (dx, dy, dz, dw, dh, dl, dr, dv*) that can be used to transform the src_boxes into the target_boxes.

Parameters
  • src_boxes (torch.Tensor) – source boxes, e.g., object proposals.

  • dst_boxes (torch.Tensor) – target of the transformation, e.g., ground-truth boxes.

Returns

Box transformation deltas.

Return type

torch.Tensor

class mmdet3d.core.bbox.DepthInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]

3D boxes of instances in Depth coordinates.

Coordinates in Depth:

up z    y front (yaw=-0.5*pi)
   ^   ^
   |  /
   | /
   0 ------> x right (yaw=0)

The relative coordinate of bottom center in a Depth box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2. The yaw is 0 at the positive direction of x axis, and decreases from the positive direction of x to the positive direction of y. Also note that rotation of DepthInstance3DBoxes is counterclockwise, which is reverse to the definition of the yaw angle (clockwise).

A refactor is ongoing to make the three coordinate systems easier to understand and convert between each other.

tensor

Float matrix of N x box_dim.

Type

torch.Tensor

box_dim

Integer indicates the dimension of a box Each row is (x, y, z, x_size, y_size, z_size, yaw, …).

Type

int

with_yaw

If True, the value of yaw will be set to 0 as minmax boxes.

Type

bool

property bev

A n x 5 tensor of 2D BEV box of each box in XYWHR format.

Type

torch.Tensor

convert_to(dst, rt_mat=None)[source]

Convert self to dst mode.

Parameters
  • dst (Box3DMode) – The target Box mode.

  • rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type in the dst mode.

Return type

DepthInstance3DBoxes

property corners

Coordinates of corners of all the boxes in shape (N, 8, 3).

Convert the boxes to corners in clockwise order, in form of (x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)

                            up z
             front y           ^
                  /            |
                 /             |
   (x0, y1, z1) + -----------  + (x1, y1, z1)
               /|            / |
              / |           /  |
(x0, y0, z1) + ----------- +   + (x1, y1, z0)
             |  /      .   |  /
             | / origin    | /
(x0, y0, z0) + ----------- + --------> right x
                           (x1, y0, z0)
Type

torch.Tensor

enlarged_box(extra_width)[source]

Enlarge the length, width and height boxes.

Parameters

extra_width (float | torch.Tensor) – Extra width to enlarge the box.

Returns

Enlarged boxes.

Return type

LiDARInstance3DBoxes

flip(bev_direction='horizontal', points=None)[source]

Flip the boxes in BEV along given BEV direction.

In Depth coordinates, it flips x (horizontal) or y (vertical) axis.

Parameters
  • bev_direction (str) – Flip direction (horizontal or vertical).

  • points (torch.Tensor, numpy.ndarray, BasePoints, None) – Points to flip. Defaults to None.

Returns

Flipped points.

Return type

torch.Tensor, numpy.ndarray or None

get_surface_line_center()[source]

Compute surface and line center of bounding boxes.

Returns

Surface and line center of bounding boxes.

Return type

torch.Tensor

property gravity_center

A tensor with center of each box.

Type

torch.Tensor

in_range_bev(box_range)[source]

Check whether the boxes are in the given range.

Parameters

box_range (list | torch.Tensor) – The range of box (x_min, y_min, x_max, y_max).

Note

In the original implementation of SECOND, checking whether a box in the range checks whether the points are in a convex polygon, we try to reduce the burdun for simpler cases.

Returns

Indicating whether each box is inside the reference range.

Return type

torch.Tensor

property nearest_bev

A tensor of 2D BEV box of each box without rotation.

Type

torch.Tensor

points_in_boxes(points)[source]

Find points that are in boxes (CUDA).

Parameters

points (torch.Tensor) – Points in shape [1, M, 3] or [M, 3], 3 dimensions are [x, y, z] in LiDAR coordinate.

Returns

The index of boxes each point lies in with shape of (B, M, T).

Return type

torch.Tensor

rotate(angle, points=None)[source]

Rotate boxes with points (optional) with the given angle or rotation matrix.

Parameters
  • angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.

  • points (torch.Tensor, numpy.ndarray, BasePoints, optional) – Points to rotate. Defaults to None.

Returns

When points is None, the function returns None, otherwise it returns the rotated points and the rotation matrix rot_mat_T.

Return type

tuple or None

class mmdet3d.core.bbox.InstanceBalancedPosSampler(num, pos_fraction, neg_pos_ub=- 1, add_gt_as_proposals=True, **kwargs)[source]

Instance balanced sampler that samples equal number of positive samples for each instance.

class mmdet3d.core.bbox.IoUBalancedNegSampler(num, pos_fraction, floor_thr=- 1, floor_fraction=0, num_bins=3, **kwargs)[source]

IoU Balanced Sampling.

arXiv: https://arxiv.org/pdf/1904.02701.pdf (CVPR 2019)

Sampling proposals according to their IoU. floor_fraction of needed RoIs are sampled from proposals whose IoU are lower than floor_thr randomly. The others are sampled from proposals whose IoU are higher than floor_thr. These proposals are sampled from some bins evenly, which are split by num_bins via IoU evenly.

Parameters
  • num (int) – number of proposals.

  • pos_fraction (float) – fraction of positive proposals.

  • floor_thr (float) – threshold (minimum) IoU for IoU balanced sampling, set to -1 if all using IoU balanced sampling.

  • floor_fraction (float) – sampling fraction of proposals under floor_thr.

  • num_bins (int) – number of bins in IoU balanced sampling.

sample_via_interval(max_overlaps, full_set, num_expected)[source]

Sample according to the iou interval.

Parameters
  • max_overlaps (torch.Tensor) – IoU between bounding boxes and ground truth boxes.

  • full_set (set(int)) – A full set of indices of boxes。

  • num_expected (int) – Number of expected samples。

Returns

Indices of samples

Return type

np.ndarray

class mmdet3d.core.bbox.LiDARInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]

3D boxes of instances in LIDAR coordinates.

Coordinates in LiDAR:

                      up z    x front (yaw=-0.5*pi)
                         ^   ^
                         |  /
                         | /
(yaw=-pi) left y <------ 0 -------- (yaw=0)

The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2. The yaw is 0 at the negative direction of y axis, and decreases from the negative direction of y to the positive direction of x.

A refactor is ongoing to make the three coordinate systems easier to understand and convert between each other.

tensor

Float matrix of N x box_dim.

Type

torch.Tensor

box_dim

Integer indicating the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw, …).

Type

int

with_yaw

If True, the value of yaw will be set to 0 as minmax boxes.

Type

bool

property bev

2D BEV box of each box with rotation in XYWHR format.

Type

torch.Tensor

convert_to(dst, rt_mat=None)[source]

Convert self to dst mode.

Parameters
  • dst (Box3DMode) – the target Box mode

  • rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type in the dst mode.

Return type

BaseInstance3DBoxes

property corners

Coordinates of corners of all the boxes in shape (N, 8, 3).

Convert the boxes to corners in clockwise order, in form of (x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)

                               up z
                front x           ^
                     /            |
                    /             |
      (x1, y0, z1) + -----------  + (x1, y1, z1)
                  /|            / |
                 / |           /  |
   (x0, y0, z1) + ----------- +   + (x1, y1, z0)
                |  /      .   |  /
                | / origin    | /
left y<-------- + ----------- + (x0, y1, z0)
    (x0, y0, z0)
Type

torch.Tensor

enlarged_box(extra_width)[source]

Enlarge the length, width and height boxes.

Parameters

extra_width (float | torch.Tensor) – Extra width to enlarge the box.

Returns

Enlarged boxes.

Return type

LiDARInstance3DBoxes

flip(bev_direction='horizontal', points=None)[source]

Flip the boxes in BEV along given BEV direction.

In LIDAR coordinates, it flips the y (horizontal) or x (vertical) axis.

Parameters
  • bev_direction (str) – Flip direction (horizontal or vertical).

  • points (torch.Tensor, numpy.ndarray, BasePoints, None) – Points to flip. Defaults to None.

Returns

Flipped points.

Return type

torch.Tensor, numpy.ndarray or None

property gravity_center

A tensor with center of each box.

Type

torch.Tensor

in_range_bev(box_range)[source]

Check whether the boxes are in the given range.

Parameters

box_range (list | torch.Tensor) – the range of box (x_min, y_min, x_max, y_max)

Note

The original implementation of SECOND checks whether boxes in a range by checking whether the points are in a convex polygon, we reduce the burden for simpler cases.

Returns

Whether each box is inside the reference range.

Return type

torch.Tensor

property nearest_bev

A tensor of 2D BEV box of each box without rotation.

Type

torch.Tensor

points_in_boxes(points)[source]

Find the box which the points are in.

Parameters

points (torch.Tensor) – Points in shape (N, 3).

Returns

The index of box where each point are in.

Return type

torch.Tensor

rotate(angle, points=None)[source]

Rotate boxes with points (optional) with the given angle or rotation matrix.

Parameters
  • angles (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.

  • points (torch.Tensor, numpy.ndarray, BasePoints, optional) – Points to rotate. Defaults to None.

Returns

When points is None, the function returns None, otherwise it returns the rotated points and the rotation matrix rot_mat_T.

Return type

tuple or None

class mmdet3d.core.bbox.MaxIoUAssigner(pos_iou_thr, neg_iou_thr, min_pos_iou=0.0, gt_max_assign_all=True, ignore_iof_thr=- 1, ignore_wrt_candidates=True, match_low_quality=True, gpu_assign_thr=- 1, iou_calculator={'type': 'BboxOverlaps2D'})[source]

Assign a corresponding gt bbox or background to each bbox.

Each proposals will be assigned with -1, or a semi-positive integer indicating the ground truth index.

  • -1: negative sample, no assigned gt

  • semi-positive integer: positive sample, index (0-based) of assigned gt

Parameters
  • pos_iou_thr (float) – IoU threshold for positive bboxes.

  • neg_iou_thr (float or tuple) – IoU threshold for negative bboxes.

  • min_pos_iou (float) – Minimum iou for a bbox to be considered as a positive bbox. Positive samples can have smaller IoU than pos_iou_thr due to the 4th step (assign max IoU sample to each gt).

  • gt_max_assign_all (bool) – Whether to assign all bboxes with the same highest overlap with some gt to that gt.

  • ignore_iof_thr (float) – IoF threshold for ignoring bboxes (if gt_bboxes_ignore is specified). Negative values mean not ignoring any bboxes.

  • ignore_wrt_candidates (bool) – Whether to compute the iof between bboxes and gt_bboxes_ignore, or the contrary.

  • match_low_quality (bool) – Whether to allow low quality matches. This is usually allowed for RPN and single stage detectors, but not allowed in the second stage. Details are demonstrated in Step 4.

  • gpu_assign_thr (int) – The upper bound of the number of GT for GPU assign. When the number of gt is above this threshold, will assign on CPU device. Negative values mean not assign on CPU.

assign(bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None)[source]

Assign gt to bboxes.

This method assign a gt bbox to every bbox (proposal/anchor), each bbox will be assigned with -1, or a semi-positive number. -1 means negative sample, semi-positive number is the index (0-based) of assigned gt. The assignment is done in following steps, the order matters.

  1. assign every bbox to the background

  2. assign proposals whose iou with all gts < neg_iou_thr to 0

  3. for each bbox, if the iou with its nearest gt >= pos_iou_thr, assign it to that bbox

  4. for each gt bbox, assign its nearest proposals (may be more than one) to itself

Parameters
  • bboxes (Tensor) – Bounding boxes to be assigned, shape(n, 4).

  • gt_bboxes (Tensor) – Groundtruth boxes, shape (k, 4).

  • gt_bboxes_ignore (Tensor, optional) – Ground truth bboxes that are labelled as ignored, e.g., crowd boxes in COCO.

  • gt_labels (Tensor, optional) – Label of gt_bboxes, shape (k, ).

Returns

The assign result.

Return type

AssignResult

Example

>>> self = MaxIoUAssigner(0.5, 0.5)
>>> bboxes = torch.Tensor([[0, 0, 10, 10], [10, 10, 20, 20]])
>>> gt_bboxes = torch.Tensor([[0, 0, 10, 9]])
>>> assign_result = self.assign(bboxes, gt_bboxes)
>>> expected_gt_inds = torch.LongTensor([1, 0])
>>> assert torch.all(assign_result.gt_inds == expected_gt_inds)
assign_wrt_overlaps(overlaps, gt_labels=None)[source]

Assign w.r.t. the overlaps of bboxes with gts.

Parameters
  • overlaps (Tensor) – Overlaps between k gt_bboxes and n bboxes, shape(k, n).

  • gt_labels (Tensor, optional) – Labels of k gt_bboxes, shape (k, ).

Returns

The assign result.

Return type

AssignResult

class mmdet3d.core.bbox.PseudoSampler(**kwargs)[source]

A pseudo sampler that does not do sampling actually.

sample(assign_result, bboxes, gt_bboxes, **kwargs)[source]

Directly returns the positive and negative indices of samples.

Parameters
  • assign_result (AssignResult) – Assigned results

  • bboxes (torch.Tensor) – Bounding boxes

  • gt_bboxes (torch.Tensor) – Ground truth boxes

Returns

sampler results

Return type

SamplingResult

class mmdet3d.core.bbox.RandomSampler(num, pos_fraction, neg_pos_ub=- 1, add_gt_as_proposals=True, **kwargs)[source]

Random sampler.

Parameters
  • num (int) – Number of samples

  • pos_fraction (float) – Fraction of positive samples

  • neg_pos_up (int, optional) – Upper bound number of negative and positive samples. Defaults to -1.

  • add_gt_as_proposals (bool, optional) – Whether to add ground truth boxes as proposals. Defaults to True.

random_choice(gallery, num)[source]

Random select some elements from the gallery.

If gallery is a Tensor, the returned indices will be a Tensor; If gallery is a ndarray or list, the returned indices will be a ndarray.

Parameters
  • gallery (Tensor | ndarray | list) – indices pool.

  • num (int) – expected sample num.

Returns

sampled indices.

Return type

Tensor or ndarray

class mmdet3d.core.bbox.SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes, assign_result, gt_flags)[source]

Bbox sampling result.

Example

>>> # xdoctest: +IGNORE_WANT
>>> from mmdet.core.bbox.samplers.sampling_result import *  # NOQA
>>> self = SamplingResult.random(rng=10)
>>> print(f'self = {self}')
self = <SamplingResult({
    'neg_bboxes': torch.Size([12, 4]),
    'neg_inds': tensor([ 0,  1,  2,  4,  5,  6,  7,  8,  9, 10, 11, 12]),
    'num_gts': 4,
    'pos_assigned_gt_inds': tensor([], dtype=torch.int64),
    'pos_bboxes': torch.Size([0, 4]),
    'pos_inds': tensor([], dtype=torch.int64),
    'pos_is_gt': tensor([], dtype=torch.uint8)
})>
property bboxes

concatenated positive and negative boxes

Type

torch.Tensor

property info

Returns a dictionary of info about the object.

classmethod random(rng=None, **kwargs)[source]
Parameters
  • rng (None | int | numpy.random.RandomState) – seed or state.

  • kwargs (keyword arguments) –

    • num_preds: number of predicted boxes

    • num_gts: number of true boxes

    • p_ignore (float): probability of a predicted box assigned to an ignored truth.

    • p_assigned (float): probability of a predicted box not being assigned.

    • p_use_label (float | bool): with labels or not.

Returns

Randomly generated sampling result.

Return type

SamplingResult

Example

>>> from mmdet.core.bbox.samplers.sampling_result import *  # NOQA
>>> self = SamplingResult.random()
>>> print(self.__dict__)
to(device)[source]

Change the device of the data inplace.

Example

>>> self = SamplingResult.random()
>>> print(f'self = {self.to(None)}')
>>> # xdoctest: +REQUIRES(--gpu)
>>> print(f'self = {self.to(0)}')
mmdet3d.core.bbox.axis_aligned_bbox_overlaps_3d(bboxes1, bboxes2, mode='iou', is_aligned=False, eps=1e-06)[source]

Calculate overlap between two set of axis aligned 3D bboxes. If is_aligned is False, then calculate the overlaps between each bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned pair of bboxes1 and bboxes2.

Parameters
  • bboxes1 (Tensor) – shape (B, m, 6) in <x1, y1, z1, x2, y2, z2> format or empty.

  • bboxes2 (Tensor) – shape (B, n, 6) in <x1, y1, z1, x2, y2, z2> format or empty. B indicates the batch dim, in shape (B1, B2, …, Bn). If is_aligned is True, then m and n must be equal.

  • mode (str) – “iou” (intersection over union) or “giou” (generalized intersection over union).

  • is_aligned (bool, optional) – If True, then m and n must be equal. Default False.

  • eps (float, optional) – A value added to the denominator for numerical stability. Default 1e-6.

Returns

shape (m, n) if is_aligned is False else shape (m,)

Return type

Tensor

Example

>>> bboxes1 = torch.FloatTensor([
>>>     [0, 0, 0, 10, 10, 10],
>>>     [10, 10, 10, 20, 20, 20],
>>>     [32, 32, 32, 38, 40, 42],
>>> ])
>>> bboxes2 = torch.FloatTensor([
>>>     [0, 0, 0, 10, 20, 20],
>>>     [0, 10, 10, 10, 19, 20],
>>>     [10, 10, 10, 20, 20, 20],
>>> ])
>>> overlaps = axis_aligned_bbox_overlaps_3d(bboxes1, bboxes2)
>>> assert overlaps.shape == (3, 3)
>>> overlaps = bbox_overlaps(bboxes1, bboxes2, is_aligned=True)
>>> assert overlaps.shape == (3, )

Example

>>> empty = torch.empty(0, 6)
>>> nonempty = torch.FloatTensor([[0, 0, 0, 10, 9, 10]])
>>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1)
>>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0)
>>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)
mmdet3d.core.bbox.bbox3d2result(bboxes, scores, labels, attrs=None)[source]

Convert detection results to a list of numpy arrays.

Parameters
  • bboxes (torch.Tensor) – Bounding boxes with shape of (n, 5).

  • labels (torch.Tensor) – Labels with shape of (n, ).

  • scores (torch.Tensor) – Scores with shape of (n, ).

  • attrs (torch.Tensor, optional) – Attributes with shape of (n, ). Defaults to None.

Returns

Bounding box results in cpu mode.

  • boxes_3d (torch.Tensor): 3D boxes.

  • scores (torch.Tensor): Prediction scores.

  • labels_3d (torch.Tensor): Box labels.

  • attrs_3d (torch.Tensor, optional): Box attributes.

Return type

dict[str, torch.Tensor]

mmdet3d.core.bbox.bbox3d2roi(bbox_list)[source]

Convert a list of bounding boxes to roi format.

Parameters

bbox_list (list[torch.Tensor]) – A list of bounding boxes corresponding to a batch of images.

Returns

Region of interests in shape (n, c), where the channels are in order of [batch_ind, x, y …].

Return type

torch.Tensor

mmdet3d.core.bbox.bbox3d_mapping_back(bboxes, scale_factor, flip_horizontal, flip_vertical)[source]

Map bboxes from testing scale to original image scale.

Parameters
  • bboxes (BaseInstance3DBoxes) – Boxes to be mapped back.

  • scale_factor (float) – Scale factor.

  • flip_horizontal (bool) – Whether to flip horizontally.

  • flip_vertical (bool) – Whether to flip vertically.

Returns

Boxes mapped back.

Return type

BaseInstance3DBoxes

mmdet3d.core.bbox.bbox_overlaps_3d(bboxes1, bboxes2, mode='iou', coordinate='camera')[source]

Calculate 3D IoU using cuda implementation.

Note

This function calculates the IoU of 3D boxes based on their volumes. IoU calculator BboxOverlaps3D uses this function to calculate the actual IoUs of boxes.

Parameters
  • bboxes1 (torch.Tensor) – shape (N, 7+C) [x, y, z, h, w, l, ry].

  • bboxes2 (torch.Tensor) – shape (M, 7+C) [x, y, z, h, w, l, ry].

  • mode (str) – “iou” (intersection over union) or iof (intersection over foreground).

  • coordinate (str) – ‘camera’ or ‘lidar’ coordinate system.

Returns

Bbox overlaps results of bboxes1 and bboxes2 with shape (M, N) (aligned mode is not supported currently).

Return type

torch.Tensor

mmdet3d.core.bbox.bbox_overlaps_nearest_3d(bboxes1, bboxes2, mode='iou', is_aligned=False, coordinate='lidar')[source]

Calculate nearest 3D IoU.

Note

This function first finds the nearest 2D boxes in bird eye view (BEV), and then calculates the 2D IoU using bbox_overlaps(). Ths IoU calculator BboxOverlapsNearest3D uses this function to calculate IoUs of boxes.

If is_aligned is False, then it calculates the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Parameters
  • bboxes1 (torch.Tensor) – shape (N, 7+C) [x, y, z, h, w, l, ry, v].

  • bboxes2 (torch.Tensor) – shape (M, 7+C) [x, y, z, h, w, l, ry, v].

  • mode (str) – “iou” (intersection over union) or iof (intersection over foreground).

  • is_aligned (bool) – Whether the calculation is aligned

Returns

If is_aligned is True, return ious between bboxes1 and bboxes2 with shape (M, N). If is_aligned is False, return shape is M.

Return type

torch.Tensor

mmdet3d.core.bbox.get_box_type(box_type)[source]

Get the type and mode of box structure.

Parameters

box_type (str) – The type of box structure. The valid value are “LiDAR”, “Camera”, or “Depth”.

Returns

Box type and box mode.

Return type

tuple

mmdet3d.core.bbox.limit_period(val, offset=0.5, period=3.141592653589793)[source]

Limit the value into a period for periodic function.

Parameters
  • val (torch.Tensor) – The value to be converted.

  • offset (float, optional) – Offset to set the value range. Defaults to 0.5.

  • period ([type], optional) – Period of the value. Defaults to np.pi.

Returns

Value in the range of [-offset * period, (1-offset) * period]

Return type

torch.Tensor

mmdet3d.core.bbox.mono_cam_box2vis(cam_box)[source]

This is a post-processing function on the bboxes from Mono-3D task. If we want to perform projection visualization, we need to:

  1. rotate the box along x-axis for np.pi / 2 (roll)

  2. change orientation from local yaw to global yaw

  3. convert yaw by (np.pi / 2 - yaw)

After applying this function, we can project and draw it on 2D images.

Parameters

cam_box (CameraInstance3DBoxes) – 3D bbox in camera coordinate system before conversion. Could be gt bbox loaded from dataset or network prediction output.

Returns

Box after conversion.

Return type

CameraInstance3DBoxes

mmdet3d.core.bbox.points_cam2img(points_3d, proj_mat, with_depth=False)[source]

Project points from camera coordicates to image coordinates.

Parameters
  • points_3d (torch.Tensor) – Points in shape (N, 3).

  • proj_mat (torch.Tensor) – Transformation matrix between coordinates.

  • with_depth (bool, optional) – Whether to keep depth in the output. Defaults to False.

Returns

Points in image coordinates with shape [N, 2].

Return type

torch.Tensor

mmdet3d.core.bbox.xywhr2xyxyr(boxes_xywhr)[source]

Convert a rotated boxes in XYWHR format to XYXYR format.

Parameters

boxes_xywhr (torch.Tensor) – Rotated boxes in XYWHR format.

Returns

Converted boxes in XYXYR format.

Return type

torch.Tensor

evaluation

mmdet3d.core.evaluation.indoor_eval(gt_annos, dt_annos, metric, label2cat, logger=None, box_type_3d=None, box_mode_3d=None)[source]

Indoor Evaluation.

Evaluate the result of the detection.

Parameters
  • gt_annos (list[dict]) – Ground truth annotations.

  • dt_annos (list[dict]) –

    Detection annotations. the dict includes the following keys

    • labels_3d (torch.Tensor): Labels of boxes.

    • boxes_3d (BaseInstance3DBoxes): 3D bounding boxes in Depth coordinate.

    • scores_3d (torch.Tensor): Scores of boxes.

  • metric (list[float]) – IoU thresholds for computing average precisions.

  • label2cat (dict) – Map from label to category.

  • logger (logging.Logger | str | None) – The way to print the mAP summary. See mmdet.utils.print_log() for details. Default: None.

Returns

Dict of results.

Return type

dict[str, float]

mmdet3d.core.evaluation.kitti_eval(gt_annos, dt_annos, current_classes, eval_types=['bbox', 'bev', '3d'])[source]

KITTI evaluation.

Parameters
  • gt_annos (list[dict]) – Contain gt information of each sample.

  • dt_annos (list[dict]) – Contain detected information of each sample.

  • current_classes (list[str]) – Classes to evaluation.

  • eval_types (list[str], optional) – Types to eval. Defaults to [‘bbox’, ‘bev’, ‘3d’].

Returns

String and dict of evaluation results.

Return type

tuple

mmdet3d.core.evaluation.kitti_eval_coco_style(gt_annos, dt_annos, current_classes)[source]

coco style evaluation of kitti.

Parameters
  • gt_annos (list[dict]) – Contain gt information of each sample.

  • dt_annos (list[dict]) – Contain detected information of each sample.

  • current_classes (list[str]) – Classes to evaluation.

Returns

Evaluation results.

Return type

string

mmdet3d.core.evaluation.lyft_eval(lyft, data_root, res_path, eval_set, output_dir, logger=None)[source]

Evaluation API for Lyft dataset.

Parameters
  • lyft (LyftDataset) – Lyft class in the sdk.

  • data_root (str) – Root of data for reading splits.

  • res_path (str) – Path of result json file recording detections.

  • eval_set (str) – Name of the split for evaluation.

  • output_dir (str) – Output directory for output json files.

  • logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

Returns

The evaluation results.

Return type

dict[str, float]

mmdet3d.core.evaluation.seg_eval(gt_labels, seg_preds, label2cat, ignore_index, logger=None)[source]

Semantic Segmentation Evaluation.

Evaluate the result of the Semantic Segmentation.

Parameters
  • gt_labels (list[torch.Tensor]) – Ground truth labels.

  • seg_preds (list[torch.Tensor]) – Predictions.

  • label2cat (dict) – Map from label to category name.

  • ignore_index (int) – Index that will be ignored in evaluation.

  • logger (logging.Logger | str | None) – The way to print the mAP summary. See mmdet.utils.print_log() for details. Default: None.

Returns

Dict of results.

Return type

dict[str, float]

visualizer

mmdet3d.core.visualizer.show_multi_modality_result(img, gt_bboxes, pred_bboxes, proj_mat, out_dir, filename, box_mode='lidar', img_metas=None, show=True, gt_bbox_color=(61, 102, 255), pred_bbox_color=(241, 101, 72))[source]

Convert multi-modality detection results into 2D results.

Project the predicted 3D bbox to 2D image plane and visualize them.

Parameters
  • img (np.ndarray) – The numpy array of image in cv2 fashion.

  • gt_bboxes (BaseInstance3DBoxes) – Ground truth boxes.

  • pred_bboxes (BaseInstance3DBoxes) – Predicted boxes.

  • proj_mat (numpy.array, shape=[4, 4]) – The projection matrix according to the camera intrinsic parameters.

  • out_dir (str) – Path of output directory.

  • filename (str) – Filename of the current frame.

  • box_mode (str) – Coordinate system the boxes are in. Should be one of ‘depth’, ‘lidar’ and ‘camera’. Defaults to ‘lidar’.

  • img_metas (dict) – Used in projecting depth bbox.

  • show (bool) – Visualize the results online. Defaults to False.

  • gt_bbox_color (str or tuple(int)) – Color of bbox lines. The tuple of color should be in BGR order. Default: (255, 102, 61)

  • pred_bbox_color (str or tuple(int)) – Color of bbox lines. The tuple of color should be in BGR order. Default: (72, 101, 241)

mmdet3d.core.visualizer.show_result(points, gt_bboxes, pred_bboxes, out_dir, filename, show=True, snapshot=False)[source]

Convert results into format that is directly readable for meshlab.

Parameters
  • points (np.ndarray) – Points.

  • gt_bboxes (np.ndarray) – Ground truth boxes.

  • pred_bboxes (np.ndarray) – Predicted boxes.

  • out_dir (str) – Path of output directory

  • filename (str) – Filename of the current frame.

  • show (bool) – Visualize the results online. Defaults to False.

  • snapshot (bool) – Whether to save the online results. Defaults to False.

mmdet3d.core.visualizer.show_seg_result(points, gt_seg, pred_seg, out_dir, filename, palette, ignore_index=None, show=True, snapshot=False)[source]

Convert results into format that is directly readable for meshlab.

Parameters
  • points (np.ndarray) – Points.

  • gt_seg (np.ndarray) – Ground truth segmentation mask.

  • pred_seg (np.ndarray) – Predicted segmentation mask.

  • out_dir (str) – Path of output directory

  • filename (str) – Filename of the current frame.

  • palette (np.ndarray) – Mapping between class labels and colors.

  • ignore_index (int, optional) – The label index to be ignored, e.g. unannotated points. Defaults to None.

  • show (bool, optional) – Visualize the results online. Defaults to False.

  • snapshot (bool, optional) – Whether to save the online results. Defaults to False.

voxel

class mmdet3d.core.voxel.VoxelGenerator(voxel_size, point_cloud_range, max_num_points, max_voxels=20000)[source]

Voxel generator in numpy implementation.

Parameters
  • voxel_size (list[float]) – Size of a single voxel

  • point_cloud_range (list[float]) – Range of points

  • max_num_points (int) – Maximum number of points in a single voxel

  • max_voxels (int, optional) – Maximum number of voxels. Defaults to 20000.

generate(points)[source]

Generate voxels given points.

property grid_size

The size of grids.

Type

np.ndarray

property max_num_points_per_voxel

Maximum number of points per voxel.

Type

int

property point_cloud_range

Range of point cloud.

Type

list[float]

property voxel_size

Size of a single voxel.

Type

list[float]

mmdet3d.core.voxel.build_voxel_generator(cfg, **kwargs)[source]

Builder of voxel generator.

post_processing

mmdet3d.core.post_processing.aligned_3d_nms(boxes, scores, classes, thresh)[source]

3d nms for aligned boxes.

Parameters
  • boxes (torch.Tensor) – Aligned box with shape [n, 6].

  • scores (torch.Tensor) – Scores of each box.

  • classes (torch.Tensor) – Class of each box.

  • thresh (float) – Iou threshold for nms.

Returns

Indices of selected boxes.

Return type

torch.Tensor

mmdet3d.core.post_processing.box3d_multiclass_nms(mlvl_bboxes, mlvl_bboxes_for_nms, mlvl_scores, score_thr, max_num, cfg, mlvl_dir_scores=None, mlvl_attr_scores=None, mlvl_bboxes2d=None)[source]

Multi-class nms for 3D boxes.

Parameters
  • mlvl_bboxes (torch.Tensor) – Multi-level boxes with shape (N, M). M is the dimensions of boxes.

  • mlvl_bboxes_for_nms (torch.Tensor) – Multi-level boxes with shape (N, 5) ([x1, y1, x2, y2, ry]). N is the number of boxes.

  • mlvl_scores (torch.Tensor) – Multi-level boxes with shape (N, C + 1). N is the number of boxes. C is the number of classes.

  • score_thr (float) – Score thredhold to filter boxes with low confidence.

  • max_num (int) – Maximum number of boxes will be kept.

  • cfg (dict) – Configuration dict of NMS.

  • mlvl_dir_scores (torch.Tensor, optional) – Multi-level scores of direction classifier. Defaults to None.

  • mlvl_attr_scores (torch.Tensor, optional) – Multi-level scores of attribute classifier. Defaults to None.

  • mlvl_bboxes2d (torch.Tensor, optional) – Multi-level 2D bounding boxes. Defaults to None.

Returns

Return results after nms, including 3D bounding boxes, scores, labels, direction scores, attribute scores (optional) and 2D bounding boxes (optional).

Return type

tuple[torch.Tensor]

mmdet3d.core.post_processing.circle_nms(dets, thresh, post_max_size=83)[source]

Circular NMS.

An object is only counted as positive if no other center with a higher confidence exists within a radius r using a bird-eye view distance metric.

Parameters
  • dets (torch.Tensor) – Detection results with the shape of [N, 3].

  • thresh (float) – Value of threshold.

  • post_max_size (int) – Max number of prediction to be kept. Defaults to 83

Returns

Indexes of the detections to be kept.

Return type

torch.Tensor

mmdet3d.core.post_processing.merge_aug_bboxes(aug_bboxes, aug_scores, img_metas, rcnn_test_cfg)[source]

Merge augmented detection bboxes and scores.

Parameters
  • aug_bboxes (list[Tensor]) – shape (n, 4*#class)

  • aug_scores (list[Tensor] or None) – shape (n, #class)

  • img_shapes (list[Tensor]) – shape (3, ).

  • rcnn_test_cfg (dict) – rcnn test config.

Returns

(bboxes, scores)

Return type

tuple

mmdet3d.core.post_processing.merge_aug_bboxes_3d(aug_results, img_metas, test_cfg)[source]

Merge augmented detection 3D bboxes and scores.

Parameters
  • aug_results (list[dict]) –

    The dict of detection results. The dict contains the following keys

    • boxes_3d (BaseInstance3DBoxes): Detection bbox.

    • scores_3d (torch.Tensor): Detection scores.

    • labels_3d (torch.Tensor): Predicted box labels.

  • img_metas (list[dict]) – Meta information of each sample.

  • test_cfg (dict) – Test config.

Returns

Bounding boxes results in cpu mode, containing merged results.

  • boxes_3d (BaseInstance3DBoxes): Merged detection bbox.

  • scores_3d (torch.Tensor): Merged detection scores.

  • labels_3d (torch.Tensor): Merged predicted box labels.

Return type

dict

mmdet3d.core.post_processing.merge_aug_masks(aug_masks, img_metas, rcnn_test_cfg, weights=None)[source]

Merge augmented mask prediction.

Parameters
  • aug_masks (list[ndarray]) – shape (n, #class, h, w)

  • img_shapes (list[ndarray]) – shape (3, ).

  • rcnn_test_cfg (dict) – rcnn test config.

Returns

(bboxes, scores)

Return type

tuple

mmdet3d.core.post_processing.merge_aug_proposals(aug_proposals, img_metas, cfg)[source]

Merge augmented proposals (multiscale, flip, etc.)

Parameters
  • aug_proposals (list[Tensor]) – proposals from different testing schemes, shape (n, 5). Note that they are not rescaled to the original image size.

  • img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmdet/datasets/pipelines/formatting.py:Collect.

  • cfg (dict) – rpn test config.

Returns

shape (n, 4), proposals corresponding to original image scale.

Return type

Tensor

mmdet3d.core.post_processing.merge_aug_scores(aug_scores)[source]

Merge augmented bbox scores.

mmdet3d.core.post_processing.multiclass_nms(multi_bboxes, multi_scores, score_thr, nms_cfg, max_num=- 1, score_factors=None, return_inds=False)[source]

NMS for multi-class bboxes.

Parameters
  • multi_bboxes (Tensor) – shape (n, #class*4) or (n, 4)

  • multi_scores (Tensor) – shape (n, #class), where the last column contains scores of the background class, but this will be ignored.

  • score_thr (float) – bbox threshold, bboxes with scores lower than it will not be considered.

  • nms_thr (float) – NMS IoU threshold

  • max_num (int, optional) – if there are more than max_num bboxes after NMS, only top max_num will be kept. Default to -1.

  • score_factors (Tensor, optional) – The factors multiplied to scores before applying NMS. Default to None.

  • return_inds (bool, optional) – Whether return the indices of kept bboxes. Default to False.

Returns

(dets, labels, indices (optional)), tensors of shape (k, 5),

(k), and (k). Dets are boxes with scores. Labels are 0-based.

Return type

tuple

mmdet3d.datasets

class mmdet3d.datasets.BackgroundPointsFilter(bbox_enlarge_range)[source]

Filter background points near the bounding box.

Parameters

bbox_enlarge_range (tuple[float], float) – Bbox enlarge range.

class mmdet3d.datasets.Custom3DDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False)[source]

Customized 3D dataset.

This is the base dataset of SUNRGB-D, ScanNet, nuScenes, and KITTI dataset.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) –

    Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’. Available options includes

    • ’LiDAR’: Box in LiDAR coordinates.

    • ’Depth’: Box in depth coordinates, usually for indoor dataset.

    • ’Camera’: Box in camera coordinates.

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

evaluate(results, metric=None, iou_thr=(0.25, 0.5), logger=None, show=False, out_dir=None, pipeline=None)[source]

Evaluate.

Evaluation in indoor protocol.

Parameters
  • results (list[dict]) – List of results.

  • metric (str | list[str]) – Metrics to be evaluated.

  • iou_thr (list[float]) – AP IoU thresholds.

  • show (bool) – Whether to visualize. Default: False.

  • out_dir (str) – Path to save the visualization results. Default: None.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

Returns

Evaluation results.

Return type

dict

format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]

Format the results to pkl file.

Parameters
  • outputs (list[dict]) – Testing results of the dataset.

  • pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

(outputs, tmp_dir), outputs is the detection results, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

classmethod get_classes(classes=None)[source]

Get class names of current dataset.

Parameters

classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.

Returns

A list of class names.

Return type

list[str]

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

  • sample_idx (str): Sample index.

  • pts_filename (str): Filename of point clouds.

  • file_name (str): Filename of point clouds.

  • ann_info (dict): Annotation info.

Return type

dict

load_annotations(ann_file)[source]

Load annotations from ann_file.

Parameters

ann_file (str) – Path of the annotation file.

Returns

List of annotations.

Return type

list[dict]

pre_pipeline(results)[source]

Initialization before data preparation.

Parameters

results (dict) –

Dict before data preprocessing.

  • img_fields (list): Image fields.

  • bbox3d_fields (list): 3D bounding boxes fields.

  • pts_mask_fields (list): Mask fields of points.

  • pts_seg_fields (list): Mask fields of point segments.

  • bbox_fields (list): Fields of bounding boxes.

  • mask_fields (list): Fields of masks.

  • seg_fields (list): Segment fields.

  • box_type_3d (str): 3D box type.

  • box_mode_3d (str): 3D box mode.

prepare_test_data(index)[source]

Prepare data for testing.

Parameters

index (int) – Index for accessing the target data.

Returns

Testing data dict of the corresponding index.

Return type

dict

prepare_train_data(index)[source]

Training data preparation.

Parameters

index (int) – Index for accessing the target data.

Returns

Training data dict of the corresponding index.

Return type

dict

class mmdet3d.datasets.Custom3DSegDataset(data_root, ann_file, pipeline=None, classes=None, palette=None, modality=None, test_mode=False, ignore_index=None, scene_idxs=None)[source]

Customized 3D dataset for semantic segmentation task.

This is the base dataset of ScanNet and S3DIS dataset.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • palette (list[list[int]], optional) – The palette of segmentation map. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

  • ignore_index (int, optional) – The label index to be ignored, e.g. unannotated points. If None is given, set to len(self.CLASSES) to be consistent with PointSegClassMapping function in pipeline. Defaults to None.

  • scene_idxs (np.ndarray | str, optional) – Precomputed index to load data. For scenes with many points, we may sample it several times. Defaults to None.

evaluate(results, metric=None, logger=None, show=False, out_dir=None, pipeline=None)[source]

Evaluate.

Evaluation in semantic segmentation protocol.

Parameters
  • results (list[dict]) – List of results.

  • metric (str | list[str]) – Metrics to be evaluated.

  • logger (logging.Logger | None | str) – Logger used for printing related information during evaluation. Defaults to None.

  • show (bool, optional) – Whether to visualize. Defaults to False.

  • out_dir (str, optional) – Path to save the visualization results. Defaults to None.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

Returns

Evaluation results.

Return type

dict

format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]

Format the results to pkl file.

Parameters
  • outputs (list[dict]) – Testing results of the dataset.

  • pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

(outputs, tmp_dir), outputs is the detection results, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_classes_and_palette(classes=None, palette=None)[source]

Get class names of current dataset.

This function is taken from MMSegmentation.

Parameters
  • classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset. Defaults to None.

  • palette (Sequence[Sequence[int]]] | np.ndarray | None) – The palette of segmentation map. If None is given, random palette will be generated. Defaults to None.

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

  • sample_idx (str): Sample index.

  • pts_filename (str): Filename of point clouds.

  • file_name (str): Filename of point clouds.

  • ann_info (dict): Annotation info.

Return type

dict

get_scene_idxs(scene_idxs)[source]

Compute scene_idxs for data sampling.

We sample more times for scenes with more points.

load_annotations(ann_file)[source]

Load annotations from ann_file.

Parameters

ann_file (str) – Path of the annotation file.

Returns

List of annotations.

Return type

list[dict]

pre_pipeline(results)[source]

Initialization before data preparation.

Parameters

results (dict) –

Dict before data preprocessing.

  • img_fields (list): Image fields.

  • pts_mask_fields (list): Mask fields of points.

  • pts_seg_fields (list): Mask fields of point segments.

  • mask_fields (list): Fields of masks.

  • seg_fields (list): Segment fields.

prepare_test_data(index)[source]

Prepare data for testing.

Parameters

index (int) – Index for accessing the target data.

Returns

Testing data dict of the corresponding index.

Return type

dict

prepare_train_data(index)[source]

Training data preparation.

Parameters

index (int) – Index for accessing the target data.

Returns

Training data dict of the corresponding index.

Return type

dict

class mmdet3d.datasets.GlobalAlignment(rotation_axis)[source]

Apply global alignment to 3D scene points by rotation and translation.

Parameters

rotation_axis (int) – Rotation axis for points and bboxes rotation.

Note

We do not record the applied rotation and translation as in GlobalRotScaleTrans. Because usually, we do not need to reverse the alignment step. For example, ScanNet 3D detection task uses aligned ground-truth bounding boxes for evaluation.

class mmdet3d.datasets.GlobalRotScaleTrans(rot_range=[- 0.78539816, 0.78539816], scale_ratio_range=[0.95, 1.05], translation_std=[0, 0, 0], shift_height=False)[source]

Apply global rotation, scaling and translation to a 3D scene.

Parameters
  • rot_range (list[float]) – Range of rotation angle. Defaults to [-0.78539816, 0.78539816] (close to [-pi/4, pi/4]).

  • scale_ratio_range (list[float]) – Range of scale ratio. Defaults to [0.95, 1.05].

  • translation_std (list[float]) – The standard deviation of translation noise. This applies random translation to a scene by a noise, which is sampled from a gaussian distribution whose standard deviation is set by translation_std. Defaults to [0, 0, 0]

  • shift_height (bool) – Whether to shift height. (the fourth dimension of indoor points) when scaling. Defaults to False.

class mmdet3d.datasets.IndoorPatchPointSample(num_points, block_size=1.5, sample_rate=None, ignore_index=None, use_normalized_coord=False, num_try=10, enlarge_size=0.2, min_unique_num=None, eps=0.01)[source]

Indoor point sample within a patch. Modified from PointNet++.

Sampling data to a certain number for semantic segmentation.

Parameters
  • num_points (int) – Number of points to be sampled.

  • block_size (float, optional) – Size of a block to sample points from. Defaults to 1.5.

  • sample_rate (float, optional) – Stride used in sliding patch generation. This parameter is unused in IndoorPatchPointSample and thus has been deprecated. We plan to remove it in the future. Defaults to None.

  • ignore_index (int, optional) – Label index that won’t be used for the segmentation task. This is set in PointSegClassMapping as neg_cls. If not None, will be used as a patch selection criterion. Defaults to None.

  • use_normalized_coord (bool, optional) – Whether to use normalized xyz as additional features. Defaults to False.

  • num_try (int, optional) – Number of times to try if the patch selected is invalid. Defaults to 10.

  • enlarge_size (float | None, optional) – Enlarge the sampled patch to [-block_size / 2 - enlarge_size, block_size / 2 + enlarge_size] as an augmentation. If None, set it as 0. Defaults to 0.2.

  • min_unique_num (int | None, optional) – Minimum number of unique points the sampled patch should contain. If None, use PointNet++’s method to judge uniqueness. Defaults to None.

  • eps (float, optional) – A value added to patch boundary to guarantee points coverage. Defaults to 1e-2.

Note

This transform should only be used in the training process of point

cloud segmentation tasks. For the sliding patch generation and inference process in testing, please refer to the slide_inference function of EncoderDecoder3D class.

class mmdet3d.datasets.IndoorPointSample(*args, **kwargs)[source]

Indoor point sample.

Sampling data to a certain number. NOTE: IndoorPointSample is deprecated in favor of PointSample

Parameters

num_points (int) – Number of points to be sampled.

class mmdet3d.datasets.KittiDataset(data_root, ann_file, split, pts_prefix='velodyne', pipeline=None, classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, pcd_limit_range=[0, - 40, - 3, 70.4, 40, 0.0])[source]

KITTI Dataset.

This class serves as the API for experiments on the KITTI Dataset.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • split (str) – Split of input data.

  • pts_prefix (str, optional) – Prefix of points files. Defaults to ‘velodyne’.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) –

    Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes

    • ’LiDAR’: Box in LiDAR coordinates.

    • ’Depth’: Box in depth coordinates, usually for indoor dataset.

    • ’Camera’: Box in camera coordinates.

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

  • pcd_limit_range (list) – The range of point cloud used to filter invalid predicted boxes. Default: [0, -40, -3, 70.4, 40, 0.0].

bbox2result_kitti(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]

Convert 3D detection results to kitti format for evaluation and test submission.

Parameters
  • net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.

  • class_names (list[String]) – A list of class names.

  • pklfile_prefix (str | None) – The prefix of pkl file.

  • submission_prefix (str | None) – The prefix of submission file.

Returns

A list of dictionaries with the kitti format.

Return type

list[dict]

bbox2result_kitti2d(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]

Convert 2D detection results to kitti format for evaluation and test submission.

Parameters
  • net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.

  • class_names (list[String]) – A list of class names.

  • pklfile_prefix (str | None) – The prefix of pkl file.

  • submission_prefix (str | None) – The prefix of submission file.

Returns

A list of dictionaries have the kitti format

Return type

list[dict]

convert_valid_bboxes(box_dict, info)[source]

Convert the predicted boxes into valid ones.

Parameters
  • box_dict (dict) –

    Box dictionaries to be converted.

    • boxes_3d (LiDARInstance3DBoxes): 3D bounding boxes.

    • scores_3d (torch.Tensor): Scores of boxes.

    • labels_3d (torch.Tensor): Class labels of boxes.

  • info (dict) – Data info.

Returns

Valid predicted boxes.

  • bbox (np.ndarray): 2D bounding boxes.

  • box3d_camera (np.ndarray): 3D bounding boxes in camera coordinate.

  • box3d_lidar (np.ndarray): 3D bounding boxes in LiDAR coordinate.

  • scores (np.ndarray): Scores of boxes.

  • label_preds (np.ndarray): Class label predictions.

  • sample_idx (int): Sample index.

Return type

dict

drop_arrays_by_name(gt_names, used_classes)[source]

Drop irrelevant ground truths by name.

Parameters
  • gt_names (list[str]) – Names of ground truths.

  • used_classes (list[str]) – Classes of interest.

Returns

Indices of ground truths that will be dropped.

Return type

np.ndarray

evaluate(results, metric=None, logger=None, pklfile_prefix=None, submission_prefix=None, show=False, out_dir=None, pipeline=None)[source]

Evaluation in KITTI protocol.

Parameters
  • results (list[dict]) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated.

  • logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

  • pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • submission_prefix (str | None) – The prefix of submission datas. If not specified, the submission data will not be generated.

  • show (bool) – Whether to visualize. Default: False.

  • out_dir (str) – Path to save the visualization results. Default: None.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

Returns

Results of each evaluation metric.

Return type

dict[str, float]

format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]

Format the results to pkl file.

Parameters
  • outputs (list[dict]) – Testing results of the dataset.

  • pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • submission_prefix (str | None) – The prefix of submitted files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

(result_files, tmp_dir), result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_ann_info(index)[source]

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

  • gt_bboxes_3d (LiDARInstance3DBoxes): 3D ground truth bboxes.

  • gt_labels_3d (np.ndarray): Labels of ground truths.

  • gt_bboxes (np.ndarray): 2D ground truth bboxes.

  • gt_labels (np.ndarray): Labels of ground truths.

  • gt_names (list[str]): Class names of ground truths.

Return type

dict

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

  • sample_idx (str): Sample index.

  • pts_filename (str): Filename of point clouds.

  • img_prefix (str | None): Prefix of image files.

  • img_info (dict): Image info.

  • lidar2img (list[np.ndarray], optional): Transformations from lidar to different cameras.

  • ann_info (dict): Annotation info.

Return type

dict

keep_arrays_by_name(gt_names, used_classes)[source]

Keep useful ground truths by name.

Parameters
  • gt_names (list[str]) – Names of ground truths.

  • used_classes (list[str]) – Classes of interest.

Returns

Indices of ground truths that will be keeped.

Return type

np.ndarray

remove_dontcare(ann_info)[source]

Remove annotations that do not need to be cared.

Parameters

ann_info (dict) – Dict of annotation infos. The 'DontCare' annotations will be removed according to ann_file[‘name’].

Returns

Annotations after filtering.

Return type

dict

show(results, out_dir, show=True, pipeline=None)[source]

Results visualization.

Parameters
  • results (list[dict]) – List of bounding boxes results.

  • out_dir (str) – Output directory of visualization result.

  • show (bool) – Visualize the results online.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

class mmdet3d.datasets.KittiMonoDataset(data_root, info_file, load_interval=1, with_velocity=False, eval_version=None, version=None, **kwargs)[source]

Monocular 3D detection on KITTI Dataset.

Parameters
  • data_root (str) – Path of dataset root.

  • info_file (str) – Path of info file.

  • load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.

  • with_velocity (bool, optional) – Whether include velocity prediction into the experiments. Defaults to False.

  • eval_version (str, optional) – Configuration version of evaluation. Defaults to None.

  • version (str, optional) – Dataset version. Defaults to None.

  • kwargs (dict) – Other arguments are the same of NuScenesMonoDataset.

bbox2result_kitti(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]

Convert 3D detection results to kitti format for evaluation and test submission.

Parameters
  • net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.

  • class_names (list[String]) – A list of class names.

  • pklfile_prefix (str | None) – The prefix of pkl file.

  • submission_prefix (str | None) – The prefix of submission file.

Returns

A list of dictionaries with the kitti format.

Return type

list[dict]

bbox2result_kitti2d(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]

Convert 2D detection results to kitti format for evaluation and test submission.

Parameters
  • net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.

  • class_names (list[String]) – A list of class names.

  • pklfile_prefix (str | None) – The prefix of pkl file.

  • submission_prefix (str | None) – The prefix of submission file.

Returns

A list of dictionaries have the kitti format

Return type

list[dict]

convert_valid_bboxes(box_dict, info)[source]

Convert the predicted boxes into valid ones.

Parameters
  • box_dict (dict) – Box dictionaries to be converted. - boxes_3d (CameraInstance3DBoxes): 3D bounding boxes. - scores_3d (torch.Tensor): Scores of boxes. - labels_3d (torch.Tensor): Class labels of boxes.

  • info (dict) – Data info.

Returns

Valid predicted boxes.
  • bbox (np.ndarray): 2D bounding boxes.

  • box3d_camera (np.ndarray): 3D bounding boxes in camera coordinate.

  • scores (np.ndarray): Scores of boxes.

  • label_preds (np.ndarray): Class label predictions.

  • sample_idx (int): Sample index.

Return type

dict

evaluate(results, metric=None, logger=None, pklfile_prefix=None, submission_prefix=None, show=False, out_dir=None)[source]

Evaluation in KITTI protocol.

Parameters
  • results (list[dict]) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated.

  • logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

  • pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • submission_prefix (str | None) – The prefix of submission datas. If not specified, the submission data will not be generated.

  • show (bool) – Whether to visualize. Default: False.

  • out_dir (str) – Path to save the visualization results. Default: None.

Returns

Results of each evaluation metric.

Return type

dict[str, float]

format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]

Format the results to pkl file.

Parameters
  • outputs (list[dict]) – Testing results of the dataset.

  • pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • submission_prefix (str | None) – The prefix of submitted files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

(result_files, tmp_dir), result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

class mmdet3d.datasets.LoadAnnotations3D(with_bbox_3d=True, with_label_3d=True, with_attr_label=False, with_mask_3d=False, with_seg_3d=False, with_bbox=False, with_label=False, with_mask=False, with_seg=False, with_bbox_depth=False, poly2mask=True, seg_3d_dtype='int', file_client_args={'backend': 'disk'})[source]

Load Annotations3D.

Load instance mask and semantic mask of points and encapsulate the items into related fields.

Parameters
  • with_bbox_3d (bool, optional) – Whether to load 3D boxes. Defaults to True.

  • with_label_3d (bool, optional) – Whether to load 3D labels. Defaults to True.

  • with_attr_label (bool, optional) – Whether to load attribute label. Defaults to False.

  • with_mask_3d (bool, optional) – Whether to load 3D instance masks. for points. Defaults to False.

  • with_seg_3d (bool, optional) – Whether to load 3D semantic masks. for points. Defaults to False.

  • with_bbox (bool, optional) – Whether to load 2D boxes. Defaults to False.

  • with_label (bool, optional) – Whether to load 2D labels. Defaults to False.

  • with_mask (bool, optional) – Whether to load 2D instance masks. Defaults to False.

  • with_seg (bool, optional) – Whether to load 2D semantic masks. Defaults to False.

  • with_bbox_depth (bool, optional) – Whether to load 2.5D boxes. Defaults to False.

  • poly2mask (bool, optional) – Whether to convert polygon annotations to bitmasks. Defaults to True.

  • seg_3d_dtype (dtype, optional) – Dtype of 3D semantic masks. Defaults to int64

  • file_client_args (dict) – Config dict of file clients, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py for more details.

class mmdet3d.datasets.LoadPointsFromFile(coord_type, load_dim=6, use_dim=[0, 1, 2], shift_height=False, use_color=False, file_client_args={'backend': 'disk'})[source]

Load Points From File.

Load sunrgbd and scannet points from file.

Parameters
  • coord_type (str) – The type of coordinates of points cloud. Available options includes: - ‘LIDAR’: Points in LiDAR coordinates. - ‘DEPTH’: Points in depth coordinates, usually for indoor dataset. - ‘CAMERA’: Points in camera coordinates.

  • load_dim (int) – The dimension of the loaded points. Defaults to 6.

  • use_dim (list[int]) – Which dimensions of the points to be used. Defaults to [0, 1, 2]. For KITTI dataset, set use_dim=4 or use_dim=[0, 1, 2, 3] to use the intensity dimension.

  • shift_height (bool) – Whether to use shifted height. Defaults to False.

  • use_color (bool) – Whether to use color features. Defaults to False.

  • file_client_args (dict) – Config dict of file clients, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py for more details. Defaults to dict(backend=’disk’).

class mmdet3d.datasets.LoadPointsFromMultiSweeps(sweeps_num=10, load_dim=5, use_dim=[0, 1, 2, 4], file_client_args={'backend': 'disk'}, pad_empty_sweeps=False, remove_close=False, test_mode=False)[source]

Load points from multiple sweeps.

This is usually used for nuScenes dataset to utilize previous sweeps.

Parameters
  • sweeps_num (int) – Number of sweeps. Defaults to 10.

  • load_dim (int) – Dimension number of the loaded points. Defaults to 5.

  • use_dim (list[int]) – Which dimension to use. Defaults to [0, 1, 2, 4].

  • file_client_args (dict) – Config dict of file clients, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py for more details. Defaults to dict(backend=’disk’).

  • pad_empty_sweeps (bool) – Whether to repeat keyframe when sweeps is empty. Defaults to False.

  • remove_close (bool) – Whether to remove close points. Defaults to False.

  • test_mode (bool) – If test_model=True used for testing, it will not randomly sample sweeps but select the nearest N frames. Defaults to False.

class mmdet3d.datasets.LyftDataset(ann_file, pipeline=None, data_root=None, classes=None, load_interval=1, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False)[source]

Lyft Dataset.

This class serves as the API for experiments on the Lyft Dataset.

Please refer to https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data for data downloading.

Parameters
  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • data_root (str) – Path of dataset root.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) –

    Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes

    • ’LiDAR’: Box in LiDAR coordinates.

    • ’Depth’: Box in depth coordinates, usually for indoor dataset.

    • ’Camera’: Box in camera coordinates.

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

evaluate(results, metric='bbox', logger=None, jsonfile_prefix=None, csv_savepath=None, result_names=['pts_bbox'], show=False, out_dir=None, pipeline=None)[source]

Evaluation in Lyft protocol.

Parameters
  • results (list[dict]) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated.

  • logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

  • jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • csv_savepath (str | None) – The path for saving csv files. It includes the file path and the csv filename, e.g., “a/b/filename.csv”. If not specified, the result will not be converted to csv file.

  • show (bool) – Whether to visualize. Default: False.

  • out_dir (str) – Path to save the visualization results. Default: None.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

Returns

Evaluation results.

Return type

dict[str, float]

format_results(results, jsonfile_prefix=None, csv_savepath=None)[source]

Format the results to json (standard format for COCO evaluation).

Parameters
  • results (list[dict]) – Testing results of the dataset.

  • jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • csv_savepath (str | None) – The path for saving csv files. It includes the file path and the csv filename, e.g., “a/b/filename.csv”. If not specified, the result will not be converted to csv file.

Returns

Returns (result_files, tmp_dir), where result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_ann_info(index)[source]

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

Annotation information consists of the following keys:

  • gt_bboxes_3d (LiDARInstance3DBoxes): 3D ground truth bboxes.

  • gt_labels_3d (np.ndarray): Labels of ground truths.

  • gt_names (list[str]): Class names of ground truths.

Return type

dict

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

  • sample_idx (str): sample index

  • pts_filename (str): filename of point clouds

  • sweeps (list[dict]): infos of sweeps

  • timestamp (float): sample timestamp

  • img_filename (str, optional): image filename

  • lidar2img (list[np.ndarray], optional): transformations from lidar to different cameras

  • ann_info (dict): annotation info

Return type

dict

json2csv(json_path, csv_savepath)[source]

Convert the json file to csv format for submission.

Parameters
  • json_path (str) – Path of the result json file.

  • csv_savepath (str) – Path to save the csv file.

load_annotations(ann_file)[source]

Load annotations from ann_file.

Parameters

ann_file (str) – Path of the annotation file.

Returns

List of annotations sorted by timestamps.

Return type

list[dict]

show(results, out_dir, show=True, pipeline=None)[source]

Results visualization.

Parameters
  • results (list[dict]) – List of bounding boxes results.

  • out_dir (str) – Output directory of visualization result.

  • show (bool) – Visualize the results online.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

class mmdet3d.datasets.NormalizePointsColor(color_mean)[source]

Normalize color of points.

Parameters

color_mean (list[float]) – Mean color of the point cloud.

class mmdet3d.datasets.NuScenesDataset(ann_file, pipeline=None, data_root=None, classes=None, load_interval=1, with_velocity=True, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, eval_version='detection_cvpr_2019', use_valid_flag=False)[source]

NuScenes Dataset.

This class serves as the API for experiments on the NuScenes Dataset.

Please refer to NuScenes Dataset for data downloading.

Parameters
  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • data_root (str) – Path of dataset root.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.

  • with_velocity (bool, optional) – Whether include velocity prediction into the experiments. Defaults to True.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) – Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes. - ‘LiDAR’: Box in LiDAR coordinates. - ‘Depth’: Box in depth coordinates, usually for indoor dataset. - ‘Camera’: Box in camera coordinates.

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

  • eval_version (bool, optional) – Configuration version of evaluation. Defaults to ‘detection_cvpr_2019’.

  • use_valid_flag (bool) – Whether to use use_valid_flag key in the info file as mask to filter gt_boxes and gt_names. Defaults to False.

evaluate(results, metric='bbox', logger=None, jsonfile_prefix=None, result_names=['pts_bbox'], show=False, out_dir=None, pipeline=None)[source]

Evaluation in nuScenes protocol.

Parameters
  • results (list[dict]) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated.

  • logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

  • jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • show (bool) – Whether to visualize. Default: False.

  • out_dir (str) – Path to save the visualization results. Default: None.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

Returns

Results of each evaluation metric.

Return type

dict[str, float]

format_results(results, jsonfile_prefix=None)[source]

Format the results to json (standard format for COCO evaluation).

Parameters
  • results (list[dict]) – Testing results of the dataset.

  • jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

Returns (result_files, tmp_dir), where result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_ann_info(index)[source]

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

Annotation information consists of the following keys:

  • gt_bboxes_3d (LiDARInstance3DBoxes): 3D ground truth bboxes

  • gt_labels_3d (np.ndarray): Labels of ground truths.

  • gt_names (list[str]): Class names of ground truths.

Return type

dict

get_cat_ids(idx)[source]

Get category distribution of single scene.

Parameters

idx (int) – Index of the data_info.

Returns

for each category, if the current scene

contains such boxes, store a list containing idx, otherwise, store empty list.

Return type

dict[list]

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

  • sample_idx (str): Sample index.

  • pts_filename (str): Filename of point clouds.

  • sweeps (list[dict]): Infos of sweeps.

  • timestamp (float): Sample timestamp.

  • img_filename (str, optional): Image filename.

  • lidar2img (list[np.ndarray], optional): Transformations from lidar to different cameras.

  • ann_info (dict): Annotation info.

Return type

dict

load_annotations(ann_file)[source]

Load annotations from ann_file.

Parameters

ann_file (str) – Path of the annotation file.

Returns

List of annotations sorted by timestamps.

Return type

list[dict]

show(results, out_dir, show=True, pipeline=None)[source]

Results visualization.

Parameters
  • results (list[dict]) – List of bounding boxes results.

  • out_dir (str) – Output directory of visualization result.

  • show (bool) – Visualize the results online.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

class mmdet3d.datasets.NuScenesMonoDataset(data_root, load_interval=1, with_velocity=True, modality=None, box_type_3d='Camera', eval_version='detection_cvpr_2019', use_valid_flag=False, version='v1.0-trainval', **kwargs)[source]

Monocular 3D detection on NuScenes Dataset.

This class serves as the API for experiments on the NuScenes Dataset.

Please refer to NuScenes Dataset for data downloading.

Parameters
  • ann_file (str) – Path of annotation file.

  • data_root (str) – Path of dataset root.

  • load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.

  • with_velocity (bool, optional) – Whether include velocity prediction into the experiments. Defaults to True.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) – Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Camera’ in this class. Available options includes. - ‘LiDAR’: Box in LiDAR coordinates. - ‘Depth’: Box in depth coordinates, usually for indoor dataset. - ‘Camera’: Box in camera coordinates.

  • eval_version (str, optional) – Configuration version of evaluation. Defaults to ‘detection_cvpr_2019’.

  • use_valid_flag (bool) – Whether to use use_valid_flag key in the info file as mask to filter gt_boxes and gt_names. Defaults to False.

  • version (str, optional) – Dataset version. Defaults to ‘v1.0-trainval’.

evaluate(results, metric='bbox', logger=None, jsonfile_prefix=None, result_names=['img_bbox'], show=False, out_dir=None, pipeline=None)[source]

Evaluation in nuScenes protocol.

Parameters
  • results (list[dict]) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated.

  • logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

  • jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • show (bool) – Whether to visualize. Default: False.

  • out_dir (str) – Path to save the visualization results. Default: None.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

Returns

Results of each evaluation metric.

Return type

dict[str, float]

format_results(results, jsonfile_prefix=None, **kwargs)[source]

Format the results to json (standard format for COCO evaluation).

Parameters
  • results (list[tuple | numpy.ndarray]) – Testing results of the dataset.

  • jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

(result_files, tmp_dir), result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_attr_name(attr_idx, label_name)[source]

Get attribute from predicted index.

This is a workaround to predict attribute when the predicted velocity is not reliable. We map the predicted attribute index to the one in the attribute set. If it is consistent with the category, we will keep it. Otherwise, we will use the default attribute.

Parameters
  • attr_idx (int) – Attribute index.

  • label_name (str) – Predicted category name.

Returns

Predicted attribute name.

Return type

str

pre_pipeline(results)[source]

Initialization before data preparation.

Parameters

results (dict) –

Dict before data preprocessing.

  • img_fields (list): Image fields.

  • bbox3d_fields (list): 3D bounding boxes fields.

  • pts_mask_fields (list): Mask fields of points.

  • pts_seg_fields (list): Mask fields of point segments.

  • bbox_fields (list): Fields of bounding boxes.

  • mask_fields (list): Fields of masks.

  • seg_fields (list): Segment fields.

  • box_type_3d (str): 3D box type.

  • box_mode_3d (str): 3D box mode.

show(results, out_dir, show=True, pipeline=None)[source]

Results visualization.

Parameters
  • results (list[dict]) – List of bounding boxes results.

  • out_dir (str) – Output directory of visualization result.

  • show (bool) – Visualize the results online.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

class mmdet3d.datasets.ObjectNameFilter(classes)[source]

Filter GT objects by their names.

Parameters

classes (list[str]) – List of class names to be kept for training.

class mmdet3d.datasets.ObjectNoise(translation_std=[0.25, 0.25, 0.25], global_rot_range=[0.0, 0.0], rot_range=[- 0.15707963267, 0.15707963267], num_try=100)[source]

Apply noise to each GT objects in the scene.

Parameters
  • translation_std (list[float], optional) – Standard deviation of the distribution where translation noise are sampled from. Defaults to [0.25, 0.25, 0.25].

  • global_rot_range (list[float], optional) – Global rotation to the scene. Defaults to [0.0, 0.0].

  • rot_range (list[float], optional) – Object rotation range. Defaults to [-0.15707963267, 0.15707963267].

  • num_try (int, optional) – Number of times to try if the noise applied is invalid. Defaults to 100.

class mmdet3d.datasets.ObjectRangeFilter(point_cloud_range)[source]

Filter objects by the range.

Parameters

point_cloud_range (list[float]) – Point cloud range.

class mmdet3d.datasets.ObjectSample(db_sampler, sample_2d=False)[source]

Sample GT objects to the data.

Parameters
  • db_sampler (dict) – Config dict of the database sampler.

  • sample_2d (bool) – Whether to also paste 2D image patch to the images This should be true when applying multi-modality cut-and-paste. Defaults to False.

static remove_points_in_boxes(points, boxes)[source]

Remove the points in the sampled bounding boxes.

Parameters
  • points (BasePoints) – Input point cloud array.

  • boxes (np.ndarray) – Sampled ground truth boxes.

Returns

Points with those in the boxes removed.

Return type

np.ndarray

class mmdet3d.datasets.PointSample(num_points, sample_range=None, replace=False)[source]

Point sample.

Sampling data to a certain number.

Parameters
  • num_points (int) – Number of points to be sampled.

  • sample_range (float, optional) – The range where to sample points. If not None, the points with depth larger than sample_range are prior to be sampled. Defaults to None.

  • replace (bool, optional) – Whether the sampling is with or without replacement. Defaults to False.

class mmdet3d.datasets.PointShuffle[source]

Shuffle input points.

class mmdet3d.datasets.PointsRangeFilter(point_cloud_range)[source]

Filter points by the range.

Parameters

point_cloud_range (list[float]) – Point cloud range.

class mmdet3d.datasets.RandomDropPointsColor(drop_ratio=0.2)[source]

Randomly set the color of points to all zeros.

Once this transform is executed, all the points’ color will be dropped. Refer to PAConv for more details.

Parameters

drop_ratio (float) – The probability of dropping point colors. Defaults to 0.2.

class mmdet3d.datasets.RandomFlip3D(sync_2d=True, flip_ratio_bev_horizontal=0.0, flip_ratio_bev_vertical=0.0, **kwargs)[source]

Flip the points & bbox.

If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.

Parameters
  • sync_2d (bool, optional) – Whether to apply flip according to the 2D images. If True, it will apply the same flip as that to 2D images. If False, it will decide whether to flip randomly and independently to that of 2D images. Defaults to True.

  • flip_ratio_bev_horizontal (float, optional) – The flipping probability in horizontal direction. Defaults to 0.0.

  • flip_ratio_bev_vertical (float, optional) – The flipping probability in vertical direction. Defaults to 0.0.

random_flip_data_3d(input_dict, direction='horizontal')[source]

Flip 3D data randomly.

Parameters
  • input_dict (dict) – Result dict from loading pipeline.

  • direction (str) – Flip direction. Default: horizontal.

Returns

Flipped results, ‘points’, ‘bbox3d_fields’ keys are updated in the result dict.

Return type

dict

class mmdet3d.datasets.RandomJitterPoints(jitter_std=[0.01, 0.01, 0.01], clip_range=[- 0.05, 0.05])[source]

Randomly jitter point coordinates.

Different from the global translation in GlobalRotScaleTrans, here we apply different noises to each point in a scene.

Parameters
  • jitter_std (list[float]) – The standard deviation of jittering noise. This applies random noise to all points in a 3D scene, which is sampled from a gaussian distribution whose standard deviation is set by jitter_std. Defaults to [0.01, 0.01, 0.01]

  • clip_range (list[float] | None) – Clip the randomly generated jitter noise into this range. If None is given, don’t perform clipping. Defaults to [-0.05, 0.05]

Note

This transform should only be used in point cloud segmentation tasks because we don’t transform ground-truth bboxes accordingly. For similar transform in detection task, please refer to ObjectNoise.

class mmdet3d.datasets.S3DISDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='Depth', filter_empty_gt=True, test_mode=False)[source]

S3DIS Dataset for Detection Task.

This class is the inner dataset for S3DIS. Since S3DIS has 6 areas, we often train on 5 of them and test on the remaining one. The one for test is Area_5 as suggested in GSDN. To concatenate 5 areas during training mmdet.datasets.dataset_wrappers.ConcatDataset should be used.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) –

    Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Depth’ in this dataset. Available options includes

    • ’LiDAR’: Box in LiDAR coordinates.

    • ’Depth’: Box in depth coordinates, usually for indoor dataset.

    • ’Camera’: Box in camera coordinates.

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

get_ann_info(index)[source]

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

  • gt_bboxes_3d (DepthInstance3DBoxes):

    3D ground truth bboxes

  • gt_labels_3d (np.ndarray): Labels of ground truths.

  • pts_instance_mask_path (str): Path of instance masks.

  • pts_semantic_mask_path (str): Path of semantic masks.

Return type

dict

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data

preprocessing pipelines. It includes the following keys:

  • pts_filename (str): Filename of point clouds.

  • file_name (str): Filename of point clouds.

  • ann_info (dict): Annotation info.

Return type

dict

class mmdet3d.datasets.S3DISSegDataset(data_root, ann_files, pipeline=None, classes=None, palette=None, modality=None, test_mode=False, ignore_index=None, scene_idxs=None)[source]

S3DIS Dataset for Semantic Segmentation Task.

This class serves as the API for experiments on the S3DIS Dataset. It wraps the provided datasets of different areas. We don’t use mmdet.datasets.dataset_wrappers.ConcatDataset because we need to concat the scene_idxs of different areas.

Please refer to the google form for data downloading.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_files (list[str]) – Path of several annotation files.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • palette (list[list[int]], optional) – The palette of segmentation map. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

  • ignore_index (int, optional) – The label index to be ignored, e.g. unannotated points. If None is given, set to len(self.CLASSES). Defaults to None.

  • scene_idxs (list[np.ndarray] | list[str], optional) – Precomputed index to load data. For scenes with many points, we may sample it several times. Defaults to None.

concat_data_infos(data_infos)[source]

Concat data_infos from several datasets to form self.data_infos.

Parameters

data_infos (list[list[dict]]) –

concat_scene_idxs(scene_idxs)[source]

Concat scene_idxs from several datasets to form self.scene_idxs.

Needs to manually add offset to scene_idxs[1, 2, …].

Parameters

scene_idxs (list[np.ndarray]) –

class mmdet3d.datasets.SUNRGBDDataset(data_root, ann_file, pipeline=None, classes=None, modality={'use_camera': True, 'use_lidar': True}, box_type_3d='Depth', filter_empty_gt=True, test_mode=False)[source]

SUNRGBD Dataset.

This class serves as the API for experiments on the SUNRGBD Dataset.

See the download page for data downloading.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) –

    Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Depth’ in this dataset. Available options includes

    • ’LiDAR’: Box in LiDAR coordinates.

    • ’Depth’: Box in depth coordinates, usually for indoor dataset.

    • ’Camera’: Box in camera coordinates.

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

evaluate(results, metric=None, iou_thr=(0.25, 0.5), iou_thr_2d=(0.5), logger=None, show=False, out_dir=None, pipeline=None)[source]

Evaluate.

Evaluation in indoor protocol.

Parameters
  • results (list[dict]) – List of results.

  • metric (str | list[str]) – Metrics to be evaluated.

  • iou_thr (list[float]) – AP IoU thresholds.

  • iou_thr_2d (list[float]) – AP IoU thresholds for 2d evaluation.

  • show (bool) – Whether to visualize. Default: False.

  • out_dir (str) – Path to save the visualization results. Default: None.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

Returns

Evaluation results.

Return type

dict

get_ann_info(index)[source]

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

  • gt_bboxes_3d (DepthInstance3DBoxes): 3D ground truth bboxes

  • gt_labels_3d (np.ndarray): Labels of ground truths.

  • pts_instance_mask_path (str): Path of instance masks.

  • pts_semantic_mask_path (str): Path of semantic masks.

Return type

dict

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

  • sample_idx (str): Sample index.

  • pts_filename (str, optional): Filename of point clouds.

  • file_name (str, optional): Filename of point clouds.

  • img_prefix (str | None, optional): Prefix of image files.

  • img_info (dict, optional): Image info.

  • calib (dict, optional): Camera calibration info.

  • ann_info (dict): Annotation info.

Return type

dict

show(results, out_dir, show=True, pipeline=None)[source]

Results visualization.

Parameters
  • results (list[dict]) – List of bounding boxes results.

  • out_dir (str) – Output directory of visualization result.

  • show (bool) – Visualize the results online.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

class mmdet3d.datasets.ScanNetDataset(data_root, ann_file, pipeline=None, classes=None, modality={'use_camera': False, 'use_depth': True}, box_type_3d='Depth', filter_empty_gt=True, test_mode=False)[source]

ScanNet Dataset for Detection Task.

This class serves as the API for experiments on the ScanNet Dataset.

Please refer to the github repo for data downloading.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) –

    Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Depth’ in this dataset. Available options includes

    • ’LiDAR’: Box in LiDAR coordinates.

    • ’Depth’: Box in depth coordinates, usually for indoor dataset.

    • ’Camera’: Box in camera coordinates.

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

get_ann_info(index)[source]

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

  • gt_bboxes_3d (DepthInstance3DBoxes): 3D ground truth bboxes

  • gt_labels_3d (np.ndarray): Labels of ground truths.

  • pts_instance_mask_path (str): Path of instance masks.

  • pts_semantic_mask_path (str): Path of semantic masks.

  • axis_align_matrix (np.ndarray): Transformation matrix for global scene alignment.

Return type

dict

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

  • sample_idx (str): Sample index.

  • pts_filename (str): Filename of point clouds.

  • file_name (str): Filename of point clouds.

  • img_prefix (str | None, optional): Prefix of image files.

  • img_info (dict, optional): Image info.

  • ann_info (dict): Annotation info.

Return type

dict

prepare_test_data(index)[source]

Prepare data for testing.

We should take axis_align_matrix from self.data_infos since we need to align point clouds.

Parameters

index (int) – Index for accessing the target data.

Returns

Testing data dict of the corresponding index.

Return type

dict

show(results, out_dir, show=True, pipeline=None)[source]

Results visualization.

Parameters
  • results (list[dict]) – List of bounding boxes results.

  • out_dir (str) – Output directory of visualization result.

  • show (bool) – Visualize the results online.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

class mmdet3d.datasets.ScanNetSegDataset(data_root, ann_file, pipeline=None, classes=None, palette=None, modality=None, test_mode=False, ignore_index=None, scene_idxs=None)[source]

ScanNet Dataset for Semantic Segmentation Task.

This class serves as the API for experiments on the ScanNet Dataset.

Please refer to the github repo for data downloading.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • palette (list[list[int]], optional) – The palette of segmentation map. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

  • ignore_index (int, optional) – The label index to be ignored, e.g. unannotated points. If None is given, set to len(self.CLASSES). Defaults to None.

  • scene_idxs (np.ndarray | str, optional) – Precomputed index to load data. For scenes with many points, we may sample it several times. Defaults to None.

format_results(results, txtfile_prefix=None)[source]

Format the results to txt file. Refer to ScanNet documentation.

Parameters
  • outputs (list[dict]) – Testing results of the dataset.

  • txtfile_prefix (str | None) – The prefix of saved files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

(outputs, tmp_dir), outputs is the detection results,

tmp_dir is the temporal directory created for saving submission files when submission_prefix is not specified.

Return type

tuple

get_ann_info(index)[source]

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

  • pts_semantic_mask_path (str): Path of semantic masks.

Return type

dict

get_scene_idxs(scene_idxs)[source]

Compute scene_idxs for data sampling.

We sample more times for scenes with more points.

show(results, out_dir, show=True, pipeline=None)[source]

Results visualization.

Parameters
  • results (list[dict]) – List of bounding boxes results.

  • out_dir (str) – Output directory of visualization result.

  • show (bool) – Visualize the results online.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

class mmdet3d.datasets.SemanticKITTIDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='Lidar', filter_empty_gt=False, test_mode=False)[source]

SemanticKITTI Dataset.

This class serves as the API for experiments on the SemanticKITTI Dataset Please refer to <http://www.semantic-kitti.org/dataset.html>`_ for data downloading

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) –

    NO 3D box for this dataset. You can choose any type Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes

    • ’LiDAR’: Box in LiDAR coordinates.

    • ’Depth’: Box in depth coordinates, usually for indoor dataset.

    • ’Camera’: Box in camera coordinates.

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

get_ann_info(index)[source]

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

  • pts_semantic_mask_path (str): Path of semantic masks.

Return type

dict

class mmdet3d.datasets.VoxelBasedPointSampler(cur_sweep_cfg, prev_sweep_cfg=None, time_dim=3)[source]

Voxel based point sampler.

Apply voxel sampling to multiple sweep points.

Parameters
  • cur_sweep_cfg (dict) – Config for sampling current points.

  • prev_sweep_cfg (dict) – Config for sampling previous points.

  • time_dim (int) – Index that indicate the time dimention for input points.

class mmdet3d.datasets.WaymoDataset(data_root, ann_file, split, pts_prefix='velodyne', pipeline=None, classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, load_interval=1, pcd_limit_range=[- 85, - 85, - 5, 85, 85, 5])[source]

Waymo Dataset.

This class serves as the API for experiments on the Waymo Dataset.

Please refer to `<https://waymo.com/open/download/>`_for data downloading. It is recommended to symlink the dataset root to $MMDETECTION3D/data and organize them as the doc shows.

Parameters
  • data_root (str) – Path of dataset root.

  • ann_file (str) – Path of annotation file.

  • split (str) – Split of input data.

  • pts_prefix (str, optional) – Prefix of points files. Defaults to ‘velodyne’.

  • pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.

  • classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.

  • modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.

  • box_type_3d (str, optional) –

    Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes

    • ’LiDAR’: box in LiDAR coordinates

    • ’Depth’: box in depth coordinates, usually for indoor dataset

    • ’Camera’: box in camera coordinates

  • filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.

  • test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

  • pcd_limit_range (list) – The range of point cloud used to filter invalid predicted boxes. Default: [-85, -85, -5, 85, 85, 5].

bbox2result_kitti(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]

Convert results to kitti format for evaluation and test submission.

Parameters
  • net_outputs (List[np.ndarray]) – list of array storing the bbox and score

  • class_nanes (List[String]) – A list of class names

  • pklfile_prefix (str | None) – The prefix of pkl file.

  • submission_prefix (str | None) – The prefix of submission file.

Returns

A list of dict have the kitti 3d format

Return type

List[dict]

convert_valid_bboxes(box_dict, info)[source]

Convert the boxes into valid format.

Parameters
  • box_dict (dict) –

    Bounding boxes to be converted.

    • boxes_3d (:obj:LiDARInstance3DBoxes): 3D bounding boxes.

    • scores_3d (np.ndarray): Scores of predicted boxes.

    • labels_3d (np.ndarray): Class labels of predicted boxes.

  • info (dict) – Dataset information dictionary.

Returns

Valid boxes after conversion.

  • bbox (np.ndarray): 2D bounding boxes (in camera 0).

  • box3d_camera (np.ndarray): 3D boxes in camera coordinates.

  • box3d_lidar (np.ndarray): 3D boxes in lidar coordinates.

  • scores (np.ndarray): Scores of predicted boxes.

  • label_preds (np.ndarray): Class labels of predicted boxes.

  • sample_idx (np.ndarray): Sample index.

Return type

dict

evaluate(results, metric='waymo', logger=None, pklfile_prefix=None, submission_prefix=None, show=False, out_dir=None, pipeline=None)[source]

Evaluation in KITTI protocol.

Parameters
  • results (list[dict]) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated. Default: ‘waymo’. Another supported metric is ‘kitti’.

  • logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

  • pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • submission_prefix (str | None) – The prefix of submission datas. If not specified, the submission data will not be generated.

  • show (bool) – Whether to visualize. Default: False.

  • out_dir (str) – Path to save the visualization results. Default: None.

  • pipeline (list[dict], optional) – raw data loading for showing. Default: None.

Returns

float]: results of each evaluation metric

Return type

dict[str

format_results(outputs, pklfile_prefix=None, submission_prefix=None, data_format='waymo')[source]

Format the results to pkl file.

Parameters
  • outputs (list[dict]) – Testing results of the dataset.

  • pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • submission_prefix (str | None) – The prefix of submitted files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

  • data_format (str | None) – Output data format. Default: ‘waymo’. Another supported choice is ‘kitti’.

Returns

(result_files, tmp_dir), result_files is a dict containing

the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_data_info(index)[source]

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Standard input_dict consists of the

data information.

  • sample_idx (str): sample index

  • pts_filename (str): filename of point clouds

  • img_prefix (str | None): prefix of image files

  • img_info (dict): image info

  • lidar2img (list[np.ndarray], optional): transformations from

    lidar to different cameras

  • ann_info (dict): annotation info

Return type

dict

mmdet3d.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, **kwargs)[source]

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

Parameters
  • dataset (Dataset) – A PyTorch dataset.

  • samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.

  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.

  • num_gpus (int) – Number of GPUs. Only used in non-distributed training.

  • dist (bool) – Distributed training/test or not. Default: True.

  • shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.

  • kwargs – any keyword argument to be used to initialize DataLoader

Returns

A PyTorch dataloader.

Return type

DataLoader

mmdet3d.datasets.get_loading_pipeline(pipeline)[source]

Only keep loading image, points and annotations related configuration.

Parameters

pipeline (list[dict] | list[Pipeline]) – Data pipeline configs or list of pipeline functions.

Returns

The new pipeline list with only

keep loading image, points and annotations related configuration.

Return type

list[dict] | list[Pipeline])

Examples

>>> pipelines = [
...    dict(type='LoadPointsFromFile',
...         coord_type='LIDAR', load_dim=4, use_dim=4),
...    dict(type='LoadImageFromFile'),
...    dict(type='LoadAnnotations3D',
...         with_bbox=True, with_label_3d=True),
...    dict(type='Resize',
...         img_scale=[(640, 192), (2560, 768)], keep_ratio=True),
...    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
...    dict(type='PointsRangeFilter',
...         point_cloud_range=point_cloud_range),
...    dict(type='ObjectRangeFilter',
...         point_cloud_range=point_cloud_range),
...    dict(type='PointShuffle'),
...    dict(type='Normalize', **img_norm_cfg),
...    dict(type='Pad', size_divisor=32),
...    dict(type='DefaultFormatBundle3D', class_names=class_names),
...    dict(type='Collect3D',
...         keys=['points', 'img', 'gt_bboxes_3d', 'gt_labels_3d'])
...    ]
>>> expected_pipelines = [
...    dict(type='LoadPointsFromFile',
...         coord_type='LIDAR', load_dim=4, use_dim=4),
...    dict(type='LoadImageFromFile'),
...    dict(type='LoadAnnotations3D',
...         with_bbox=True, with_label_3d=True),
...    dict(type='DefaultFormatBundle3D', class_names=class_names),
...    dict(type='Collect3D',
...         keys=['points', 'img', 'gt_bboxes_3d', 'gt_labels_3d'])
...    ]
>>> assert expected_pipelines ==        ...        get_loading_pipeline(pipelines)

mmdet3d.models

detectors

class mmdet3d.models.detectors.Base3DDetector(init_cfg=None)[source]

Base class for detectors.

forward(return_loss=True, **kwargs)[source]

Calls either forward_train or forward_test depending on whether return_loss=True.

Note this setting will change the expected inputs. When return_loss=True, img and img_metas are single-nested (i.e. torch.Tensor and list[dict]), and when resturn_loss=False, img and img_metas should be double nested (i.e. list[torch.Tensor], list[list[dict]]), with the outer list indicating test time augmentations.

forward_test(points, img_metas, img=None, **kwargs)[source]
Parameters
  • points (list[torch.Tensor]) – the outer list indicates test-time augmentations and inner torch.Tensor should have a shape NxC, which contains all points in the batch.

  • img_metas (list[list[dict]]) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch

  • img (list[torch.Tensor], optional) – the outer list indicates test-time augmentations and inner torch.Tensor should have a shape NxCxHxW, which contains all images in the batch. Defaults to None.

show_results(data, result, out_dir)[source]

Results visualization.

Parameters
  • data (list[dict]) – Input points and the information of the sample.

  • result (list[dict]) – Prediction results.

  • out_dir (str) – Output directory of visualization result.

class mmdet3d.models.detectors.CenterPoint(pts_voxel_layer=None, pts_voxel_encoder=None, pts_middle_encoder=None, pts_fusion_layer=None, img_backbone=None, pts_backbone=None, img_neck=None, pts_neck=None, pts_bbox_head=None, img_roi_head=None, img_rpn_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

Base class of Multi-modality VoxelNet.

aug_test(points, img_metas, imgs=None, rescale=False)[source]

Test function with augmentaiton.

aug_test_pts(feats, img_metas, rescale=False)[source]

Test function of point cloud branch with augmentaiton.

The function implementation process is as follows:

  • step 1: map features back for double-flip augmentation.

  • step 2: merge all features and generate boxes.

  • step 3: map boxes back for scale augmentation.

  • step 4: merge results.

Parameters
  • feats (list[torch.Tensor]) – Feature of point cloud.

  • img_metas (list[dict]) – Meta information of samples.

  • rescale (bool) – Whether to rescale bboxes. Default: False.

Returns

Returned bboxes consists of the following keys:

  • boxes_3d (LiDARInstance3DBoxes): Predicted bboxes.

  • scores_3d (torch.Tensor): Scores of predicted boxes.

  • labels_3d (torch.Tensor): Labels of predicted boxes.

Return type

dict

extract_pts_feat(pts, img_feats, img_metas)[source]

Extract features of points.

forward_pts_train(pts_feats, gt_bboxes_3d, gt_labels_3d, img_metas, gt_bboxes_ignore=None)[source]

Forward function for point cloud branch.

Parameters
  • pts_feats (list[torch.Tensor]) – Features of point cloud branch

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth boxes for each sample.

  • gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole

  • img_metas (list[dict]) – Meta information of samples.

  • gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.

Returns

Losses of each branch.

Return type

dict

simple_test_pts(x, img_metas, rescale=False)[source]

Test function of point cloud branch.

class mmdet3d.models.detectors.DynamicMVXFasterRCNN(**kwargs)[source]

Multi-modality VoxelNet using Faster R-CNN and dynamic voxelization.

extract_pts_feat(points, img_feats, img_metas)[source]

Extract point features.

voxelize(points)[source]

Apply dynamic voxelization to points.

Parameters

points (list[torch.Tensor]) – Points of each sample.

Returns

Concatenated points and coordinates.

Return type

tuple[torch.Tensor]

class mmdet3d.models.detectors.DynamicVoxelNet(voxel_layer, voxel_encoder, middle_encoder, backbone, neck=None, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

VoxelNet using dynamic voxelization.

extract_feat(points, img_metas)[source]

Extract features from points.

voxelize(points)[source]

Apply dynamic voxelization to points.

Parameters

points (list[torch.Tensor]) – Points of each sample.

Returns

Concatenated points and coordinates.

Return type

tuple[torch.Tensor]

class mmdet3d.models.detectors.FCOSMono3D(backbone, neck, bbox_head, train_cfg=None, test_cfg=None, pretrained=None)[source]

FCOS3D for monocular 3D object detection.

Currently please refer to our entry on the leaderboard.

class mmdet3d.models.detectors.GroupFree3DNet(backbone, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None)[source]

Group-Free 3D.

aug_test(points, img_metas, imgs=None, rescale=False)[source]

Test with augmentation.

forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, gt_bboxes_ignore=None)[source]

Forward of training.

Parameters
  • points (list[torch.Tensor]) – Points of each batch.

  • img_metas (list) – Image metas.

  • gt_bboxes_3d (BaseInstance3DBoxes) – gt bboxes of each batch.

  • gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.

  • pts_semantic_mask (None | list[torch.Tensor]) – point-wise semantic label of each batch.

  • pts_instance_mask (None | list[torch.Tensor]) – point-wise instance label of each batch.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

torch.Tensor]: Losses.

Return type

dict[str

simple_test(points, img_metas, imgs=None, rescale=False)[source]

Forward of testing.

Parameters
  • points (list[torch.Tensor]) – Points of each sample.

  • img_metas (list) – Image metas.

  • rescale (bool) – Whether to rescale results.

Returns

Predicted 3d boxes.

Return type

list

class mmdet3d.models.detectors.H3DNet(backbone, neck=None, rpn_head=None, roi_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

H3DNet model.

Please refer to the paper

aug_test(points, img_metas, imgs=None, rescale=False)[source]

Test with augmentation.

extract_feats(points, img_metas)[source]

Extract features of multiple samples.

forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, gt_bboxes_ignore=None)[source]

Forward of training.

Parameters
  • points (list[torch.Tensor]) – Points of each batch.

  • img_metas (list) – Image metas.

  • gt_bboxes_3d (BaseInstance3DBoxes) – gt bboxes of each batch.

  • gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.

  • pts_semantic_mask (None | list[torch.Tensor]) – point-wise semantic label of each batch.

  • pts_instance_mask (None | list[torch.Tensor]) – point-wise instance label of each batch.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Losses.

Return type

dict

simple_test(points, img_metas, imgs=None, rescale=False)[source]

Forward of testing.

Parameters
  • points (list[torch.Tensor]) – Points of each sample.

  • img_metas (list) – Image metas.

  • rescale (bool) – Whether to rescale results.

Returns

Predicted 3d boxes.

Return type

list

class mmdet3d.models.detectors.ImVoteNet(pts_backbone=None, pts_bbox_heads=None, pts_neck=None, img_backbone=None, img_neck=None, img_roi_head=None, img_rpn_head=None, img_mlp=None, freeze_img_branch=False, fusion_layer=None, num_sampled_seed=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

ImVoteNet for 3D detection.

aug_test(points=None, img_metas=None, imgs=None, bboxes_2d=None, rescale=False, **kwargs)[source]

Test function with augmentation, stage 2.

Parameters
  • points (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and the inner list contains all points in the batch, where each Tensor should have a shape NxC. Defaults to None.

  • img_metas (list[list[dict]], optional) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. Defaults to None.

  • imgs (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch. Defaults to None. Defaults to None.

  • bboxes_2d (list[list[torch.Tensor]], optional) – Provided 2d bboxes, not supported yet. Defaults to None.

  • rescale (bool, optional) – Whether or not rescale bboxes. Defaults to False.

Returns

Predicted 3d boxes.

Return type

list[dict]

aug_test_img_only(img, img_metas, rescale=False)[source]

Test function with augmentation, image network pretrain. May refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/detectors/two_stage.py.

Parameters
  • img (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch. Defaults to None. Defaults to None.

  • img_metas (list[list[dict]], optional) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. Defaults to None.

  • rescale (bool, optional) – Whether or not rescale bboxes to the original shape of input image. If rescale is False, then returned bboxes and masks will fit the scale of imgs[0]. Defaults to None.

Returns

Predicted 2d boxes.

Return type

list[list[torch.Tensor]]

extract_bboxes_2d(img, img_metas, train=True, bboxes_2d=None, **kwargs)[source]

Extract bounding boxes from 2d detector.

Parameters
  • img (torch.Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.

  • img_metas (list[dict]) – Image meta info.

  • train (bool) – train-time or not.

  • bboxes_2d (list[torch.Tensor]) – provided 2d bboxes, not supported yet.

Returns

a list of processed 2d bounding boxes.

Return type

list[torch.Tensor]

extract_feat(imgs)[source]

Just to inherit from abstract method.

extract_img_feat(img)[source]

Directly extract features from the img backbone+neck.

extract_img_feats(imgs)[source]

Extract features from multiple images.

Parameters

imgs (list[torch.Tensor]) – A list of images. The images are augmented from the same image but in different ways.

Returns

Features of different images

Return type

list[torch.Tensor]

extract_pts_feat(pts)[source]

Extract features of points.

extract_pts_feats(pts)[source]

Extract features of points from multiple samples.

forward_test(points=None, img_metas=None, img=None, bboxes_2d=None, **kwargs)[source]

Forwarding of test for image branch pretrain or stage 2 train.

Parameters
  • points (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and the inner list contains all points in the batch, where each Tensor should have a shape NxC. Defaults to None.

  • img_metas (list[list[dict]], optional) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. Defaults to None.

  • img (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch. Defaults to None. Defaults to None.

  • bboxes_2d (list[list[torch.Tensor]], optional) – Provided 2d bboxes, not supported yet. Defaults to None.

Returns

Predicted 2d or 3d boxes.

Return type

list[list[torch.Tensor]]|list[dict]

forward_train(points=None, img=None, img_metas=None, gt_bboxes=None, gt_labels=None, gt_bboxes_ignore=None, gt_masks=None, proposals=None, bboxes_2d=None, gt_bboxes_3d=None, gt_labels_3d=None, pts_semantic_mask=None, pts_instance_mask=None, **kwargs)[source]

Forwarding of train for image branch pretrain or stage 2 train.

Parameters
  • points (list[torch.Tensor]) – Points of each batch.

  • img (torch.Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.

  • img_metas (list[dict]) – list of image and point cloud meta info dict. For example, keys include ‘ori_shape’, ‘img_norm_cfg’, and ‘transformation_3d_flow’. For details on the values of the keys see mmdet/datasets/pipelines/formatting.py:Collect.

  • gt_bboxes (list[torch.Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.

  • gt_labels (list[torch.Tensor]) – class indices for each 2d bounding box.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – specify which 2d bounding boxes can be ignored when computing the loss.

  • gt_masks (None | torch.Tensor) – true segmentation masks for each 2d bbox, used if the architecture supports a segmentation task.

  • proposals – override rpn proposals (2d) with custom proposals. Use when with_rpn is False.

  • bboxes_2d (list[torch.Tensor]) – provided 2d bboxes, not supported yet.

  • gt_bboxes_3d (BaseInstance3DBoxes) – 3d gt bboxes.

  • gt_labels_3d (list[torch.Tensor]) – gt class labels for 3d bboxes.

  • pts_semantic_mask (None | list[torch.Tensor]) – point-wise semantic label of each batch.

  • pts_instance_mask (None | list[torch.Tensor]) – point-wise instance label of each batch.

Returns

a dictionary of loss components.

Return type

dict[str, torch.Tensor]

freeze_img_branch_params()[source]

Freeze all image branch parameters.

simple_test(points=None, img_metas=None, img=None, bboxes_2d=None, rescale=False, **kwargs)[source]

Test without augmentation, stage 2.

Parameters
  • points (list[torch.Tensor], optional) – Elements in the list should have a shape NxC, the list indicates all point-clouds in the batch. Defaults to None.

  • img_metas (list[dict], optional) – List indicates images in a batch. Defaults to None.

  • img (torch.Tensor, optional) – Should have a shape NxCxHxW, which contains all images in the batch. Defaults to None.

  • bboxes_2d (list[torch.Tensor], optional) – Provided 2d bboxes, not supported yet. Defaults to None.

  • rescale (bool, optional) – Whether or not rescale bboxes. Defaults to False.

Returns

Predicted 3d boxes.

Return type

list[dict]

simple_test_img_only(img, img_metas, proposals=None, rescale=False)[source]

Test without augmentation, image network pretrain. May refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/detectors/two_stage.py.

Parameters
  • img (torch.Tensor) – Should have a shape NxCxHxW, which contains all images in the batch.

  • img_metas (list[dict]) –

  • proposals (list[Tensor], optional) – override rpn proposals with custom proposals. Defaults to None.

  • rescale (bool, optional) – Whether or not rescale bboxes to the original shape of input image. Defaults to False.

Returns

Predicted 2d boxes.

Return type

list[list[torch.Tensor]]

train(mode=True)[source]

Overload in order to keep image branch modules in eval mode.

property with_img_backbone

Whether the detector has a 2D image backbone.

Type

bool

property with_img_bbox

Whether the detector has a 2D image box head.

Type

bool

property with_img_bbox_head

Whether the detector has a 2D image box head (not roi).

Type

bool

property with_img_neck

Whether the detector has a neck in image branch.

Type

bool

property with_img_roi_head

Whether the detector has a RoI Head in image branch.

Type

bool

property with_img_rpn

Whether the detector has a 2D RPN in image detector branch.

Type

bool

property with_pts_backbone

Whether the detector has a 3D backbone.

Type

bool

property with_pts_bbox

Whether the detector has a 3D box head.

Type

bool

property with_pts_neck

Whether the detector has a neck in 3D detector branch.

Type

bool

class mmdet3d.models.detectors.ImVoxelNet(backbone, neck, neck_3d, bbox_head, n_voxels, anchor_generator, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

ImVoxelNet.

aug_test(imgs, img_metas, **kwargs)[source]

Test with augmentations.

Parameters
  • imgs (list[torch.Tensor]) – Input images of shape (N, C_in, H, W).

  • img_metas (list) – Image metas.

Returns

Predicted 3d boxes.

Return type

list[dict]

extract_feat(img, img_metas)[source]

Extract 3d features from the backbone -> fpn -> 3d projection.

Parameters
  • img (torch.Tensor) – Input images of shape (N, C_in, H, W).

  • img_metas (list) – Image metas.

Returns

of shape (N, C_out, N_x, N_y, N_z)

Return type

torch.Tensor

forward_test(img, img_metas, **kwargs)[source]

Forward of testing.

Parameters
  • img (torch.Tensor) – Input images of shape (N, C_in, H, W).

  • img_metas (list) – Image metas.

Returns

Predicted 3d boxes.

Return type

list[dict]

forward_train(img, img_metas, gt_bboxes_3d, gt_labels_3d, **kwargs)[source]

Forward of training.

Parameters
  • img (torch.Tensor) – Input images of shape (N, C_in, H, W).

  • img_metas (list) – Image metas.

  • gt_bboxes_3d (BaseInstance3DBoxes) – gt bboxes of each batch.

  • gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.

Returns

A dictionary of loss components.

Return type

dict[str, torch.Tensor]

simple_test(img, img_metas)[source]

Test without augmentations.

Parameters
  • img (torch.Tensor) – Input images of shape (N, C_in, H, W).

  • img_metas (list) – Image metas.

Returns

Predicted 3d boxes.

Return type

list[dict]

class mmdet3d.models.detectors.MVXFasterRCNN(**kwargs)[source]

Multi-modality VoxelNet using Faster R-CNN.

class mmdet3d.models.detectors.MVXTwoStageDetector(pts_voxel_layer=None, pts_voxel_encoder=None, pts_middle_encoder=None, pts_fusion_layer=None, img_backbone=None, pts_backbone=None, img_neck=None, pts_neck=None, pts_bbox_head=None, img_roi_head=None, img_rpn_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

Base class of Multi-modality VoxelNet.

aug_test(points, img_metas, imgs=None, rescale=False)[source]

Test function with augmentaiton.

aug_test_pts(feats, img_metas, rescale=False)[source]

Test function of point cloud branch with augmentaiton.

extract_feat(points, img, img_metas)[source]

Extract features from images and points.

extract_feats(points, img_metas, imgs=None)[source]

Extract point and image features of multiple samples.

extract_img_feat(img, img_metas)[source]

Extract features of images.

extract_pts_feat(pts, img_feats, img_metas)[source]

Extract features of points.

forward_img_train(x, img_metas, gt_bboxes, gt_labels, gt_bboxes_ignore=None, proposals=None, **kwargs)[source]

Forward function for image branch.

This function works similar to the forward function of Faster R-CNN.

Parameters
  • x (list[torch.Tensor]) – Image features of shape (B, C, H, W) of multiple levels.

  • img_metas (list[dict]) – Meta information of images.

  • gt_bboxes (list[torch.Tensor]) – Ground truth boxes of each image sample.

  • gt_labels (list[torch.Tensor]) – Ground truth labels of boxes.

  • gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.

  • proposals (list[torch.Tensor], optional) – Proposals of each sample. Defaults to None.

Returns

Losses of each branch.

Return type

dict

forward_pts_train(pts_feats, gt_bboxes_3d, gt_labels_3d, img_metas, gt_bboxes_ignore=None)[source]

Forward function for point cloud branch.

Parameters
  • pts_feats (list[torch.Tensor]) – Features of point cloud branch

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth boxes for each sample.

  • gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole

  • img_metas (list[dict]) – Meta information of samples.

  • gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.

Returns

Losses of each branch.

Return type

dict

forward_train(points=None, img_metas=None, gt_bboxes_3d=None, gt_labels_3d=None, gt_labels=None, gt_bboxes=None, img=None, proposals=None, gt_bboxes_ignore=None)[source]

Forward training function.

Parameters
  • points (list[torch.Tensor], optional) – Points of each sample. Defaults to None.

  • img_metas (list[dict], optional) – Meta information of each sample. Defaults to None.

  • gt_bboxes_3d (list[BaseInstance3DBoxes], optional) – Ground truth 3D boxes. Defaults to None.

  • gt_labels_3d (list[torch.Tensor], optional) – Ground truth labels of 3D boxes. Defaults to None.

  • gt_labels (list[torch.Tensor], optional) – Ground truth labels of 2D boxes in images. Defaults to None.

  • gt_bboxes (list[torch.Tensor], optional) – Ground truth 2D boxes in images. Defaults to None.

  • img (torch.Tensor optional) – Images of each sample with shape (N, C, H, W). Defaults to None.

  • proposals ([list[torch.Tensor], optional) – Predicted proposals used for training Fast RCNN. Defaults to None.

  • gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth 2D boxes in images to be ignored. Defaults to None.

Returns

Losses of different branches.

Return type

dict

show_results(data, result, out_dir)[source]

Results visualization.

Parameters
  • data (dict) – Input points and the information of the sample.

  • result (dict) – Prediction results.

  • out_dir (str) – Output directory of visualization result.

simple_test(points, img_metas, img=None, rescale=False)[source]

Test function without augmentaiton.

simple_test_img(x, img_metas, proposals=None, rescale=False)[source]

Test without augmentation.

simple_test_pts(x, img_metas, rescale=False)[source]

Test function of point cloud branch.

simple_test_rpn(x, img_metas, rpn_test_cfg)[source]

RPN test function.

voxelize(points)[source]

Apply dynamic voxelization to points.

Parameters

points (list[torch.Tensor]) – Points of each sample.

Returns

Concatenated points, number of points

per voxel, and coordinates.

Return type

tuple[torch.Tensor]

property with_fusion

Whether the detector has a fusion layer.

Type

bool

property with_img_backbone

Whether the detector has a 2D image backbone.

Type

bool

property with_img_bbox

Whether the detector has a 2D image box head.

Type

bool

property with_img_neck

Whether the detector has a neck in image branch.

Type

bool

property with_img_roi_head

Whether the detector has a RoI Head in image branch.

Type

bool

property with_img_rpn

Whether the detector has a 2D RPN in image detector branch.

Type

bool

property with_img_shared_head

Whether the detector has a shared head in image branch.

Type

bool

property with_middle_encoder

Whether the detector has a middle encoder.

Type

bool

property with_pts_backbone

Whether the detector has a 3D backbone.

Type

bool

property with_pts_bbox

Whether the detector has a 3D box head.

Type

bool

property with_pts_neck

Whether the detector has a neck in 3D detector branch.

Type

bool

property with_voxel_encoder

Whether the detector has a voxel encoder.

Type

bool

class mmdet3d.models.detectors.PartA2(voxel_layer, voxel_encoder, middle_encoder, backbone, neck=None, rpn_head=None, roi_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

Part-A2 detector.

Please refer to the paper

extract_feat(points, img_metas)[source]

Extract features from points.

forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, gt_bboxes_ignore=None, proposals=None)[source]

Training forward function.

Parameters
  • points (list[torch.Tensor]) – Point cloud of each sample.

  • img_metas (list[dict]) – Meta information of each sample

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth boxes for each sample.

  • gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole

  • gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.

Returns

Losses of each branch.

Return type

dict

simple_test(points, img_metas, proposals=None, rescale=False)[source]

Test function without augmentaiton.

voxelize(points)[source]

Apply hard voxelization to points.

class mmdet3d.models.detectors.SSD3DNet(backbone, bbox_head=None, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]

3DSSDNet model.

https://arxiv.org/abs/2002.10187.pdf

class mmdet3d.models.detectors.SingleStageMono3DDetector(backbone, neck=None, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

Base class for monocular 3D single-stage detectors.

Single-stage detectors directly and densely predict bounding boxes on the output features of the backbone+neck.

aug_test(imgs, img_metas, rescale=False)[source]

Test function with test time augmentation.

extract_feats(imgs)[source]

Directly extract features from the backbone+neck.

forward_train(img, img_metas, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels=None, gt_bboxes_ignore=None)[source]
Parameters
  • img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.

  • img_metas (list[dict]) – A List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmdet.datasets.pipelines.Collect.

  • gt_bboxes (list[Tensor]) – Each item are the truth boxes for each image in [tl_x, tl_y, br_x, br_y] format.

  • gt_labels (list[Tensor]) – Class indices corresponding to each box

  • gt_bboxes_3d (list[Tensor]) – Each item are the 3D truth boxes for each image in [x, y, z, w, l, h, theta, vx, vy] format.

  • gt_labels_3d (list[Tensor]) – 3D class indices corresponding to each box.

  • centers2d (list[Tensor]) – Projected 3D centers onto 2D images.

  • depths (list[Tensor]) – Depth of projected centers on 2D images.

  • attr_labels (list[Tensor], optional) – Attribute indices corresponding to each box

  • gt_bboxes_ignore (None | list[Tensor]) – Specify which bounding boxes can be ignored when computing the loss.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

show_results(data, result, out_dir)[source]

Results visualization.

Parameters
  • data (list[dict]) – Input images and the information of the sample.

  • result (list[dict]) – Prediction results.

  • out_dir (str) – Output directory of visualization result.

simple_test(img, img_metas, rescale=False)[source]

Test function without test time augmentation.

Parameters
  • imgs (list[torch.Tensor]) – List of multiple images

  • img_metas (list[dict]) – List of image information.

  • rescale (bool, optional) – Whether to rescale the results. Defaults to False.

Returns

BBox results of each image and classes.

The outer list corresponds to each image. The inner list corresponds to each class.

Return type

list[list[np.ndarray]]

class mmdet3d.models.detectors.VoteNet(backbone, bbox_head=None, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]

VoteNet for 3D detection.

aug_test(points, img_metas, imgs=None, rescale=False)[source]

Test with augmentation.

forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, gt_bboxes_ignore=None)[source]

Forward of training.

Parameters
  • points (list[torch.Tensor]) – Points of each batch.

  • img_metas (list) – Image metas.

  • gt_bboxes_3d (BaseInstance3DBoxes) – gt bboxes of each batch.

  • gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.

  • pts_semantic_mask (None | list[torch.Tensor]) – point-wise semantic label of each batch.

  • pts_instance_mask (None | list[torch.Tensor]) – point-wise instance label of each batch.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Losses.

Return type

dict

simple_test(points, img_metas, imgs=None, rescale=False)[source]

Forward of testing.

Parameters
  • points (list[torch.Tensor]) – Points of each sample.

  • img_metas (list) – Image metas.

  • rescale (bool) – Whether to rescale results.

Returns

Predicted 3d boxes.

Return type

list

class mmdet3d.models.detectors.VoxelNet(voxel_layer, voxel_encoder, middle_encoder, backbone, neck=None, bbox_head=None, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]

VoxelNet for 3D detection.

aug_test(points, img_metas, imgs=None, rescale=False)[source]

Test function with augmentaiton.

extract_feat(points, img_metas=None)[source]

Extract features from points.

forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, gt_bboxes_ignore=None)[source]

Training forward function.

Parameters
  • points (list[torch.Tensor]) – Point cloud of each sample.

  • img_metas (list[dict]) – Meta information of each sample

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth boxes for each sample.

  • gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole

  • gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.

Returns

Losses of each branch.

Return type

dict

simple_test(points, img_metas, imgs=None, rescale=False)[source]

Test function without augmentaiton.

voxelize(points)[source]

Apply hard voxelization to points.

backbones

class mmdet3d.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=True, with_cp=False, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions arXiv:.

Parameters
  • extra (dict) –

    Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:

    • num_modules(int): The number of HRModule in this stage.

    • num_branches(int): The number of branches in the HRModule.

    • block(str): The type of convolution block.

    • num_blocks(tuple): The number of blocks in each branch.

      The length must be equal to num_branches.

    • num_channels(tuple): The number of channels in each branch.

      The length must be equal to num_branches.

  • in_channels (int) – Number of input image channels. Default: 3.

  • conv_cfg (dict) – Dictionary to construct and config conv layer.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: True.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.

  • multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmdet.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)
forward(x)[source]

Forward function.

property norm1

the normalization layer named “norm1”

Type

nn.Module

property norm2

the normalization layer named “norm2”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode will keeping the normalization layer freezed.

class mmdet3d.models.backbones.MultiBackbone(num_streams, backbones, aggregation_mlp_channels=None, conv_cfg={'type': 'Conv1d'}, norm_cfg={'eps': 1e-05, 'momentum': 0.01, 'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, suffixes=('net0', 'net1'), init_cfg=None, pretrained=None, **kwargs)[source]

MultiBackbone with different configs.

Parameters
  • num_streams (int) – The number of backbones.

  • backbones (list or dict) – A list of backbone configs.

  • aggregation_mlp_channels (list[int]) – Specify the mlp layers for feature aggregation.

  • conv_cfg (dict) – Config dict of convolutional layers.

  • norm_cfg (dict) – Config dict of normalization layers.

  • act_cfg (dict) – Config dict of activation layers.

  • suffixes (list) – A list of suffixes to rename the return dict for each backbone.

forward(points)[source]

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs from multiple backbones.

  • fp_xyz[suffix] (list[torch.Tensor]): The coordinates of each fp features.

  • fp_features[suffix] (list[torch.Tensor]): The features from each Feature Propagate Layers.

  • fp_indices[suffix] (list[torch.Tensor]): Indices of the input points.

  • hd_feature (torch.Tensor): The aggregation feature from multiple backbones.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.backbones.NoStemRegNet(arch, init_cfg=None, **kwargs)[source]

RegNet backbone without Stem for 3D detection.

More details can be found in paper .

Parameters
  • arch (dict) – The parameter of RegNets. - w0 (int): Initial width. - wa (float): Slope of width. - wm (float): Quantization parameter to quantize the width. - depth (int): Depth of the backbone. - group_w (int): Width of group. - bot_mul (float): Bottleneck ratio, i.e. expansion of bottlneck.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • base_channels (int) – Base channels after stem layer.

  • in_channels (int) – Number of input image channels. Normally 3.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

Example

>>> from mmdet3d.models import NoStemRegNet
>>> import torch
>>> self = NoStemRegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0))
>>> self.eval()
>>> inputs = torch.rand(1, 64, 16, 16)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)
forward(x)[source]

Forward function of backbone.

Parameters

x (torch.Tensor) – Features in shape (N, C, H, W).

Returns

Multi-scale features.

Return type

tuple[torch.Tensor]

class mmdet3d.models.backbones.PointNet2SAMSG(in_channels, num_points=(2048, 1024, 512, 256), radii=((0.2, 0.4, 0.8), (0.4, 0.8, 1.6), (1.6, 3.2, 4.8)), num_samples=((32, 32, 64), (32, 32, 64), (32, 32, 32)), sa_channels=(((16, 16, 32), (16, 16, 32), (32, 32, 64)), ((64, 64, 128), (64, 64, 128), (64, 96, 128)), ((128, 128, 256), (128, 192, 256), (128, 256, 256))), aggregation_channels=(64, 128, 256), fps_mods=('D-FPS', 'FS', ('F-FPS', 'D-FPS')), fps_sample_range_lists=(- 1, - 1, (512, - 1)), dilated_group=(True, True, True), out_indices=(2), norm_cfg={'type': 'BN2d'}, sa_cfg={'normalize_xyz': False, 'pool_mod': 'max', 'type': 'PointSAModuleMSG', 'use_xyz': True}, init_cfg=None)[source]

PointNet2 with Multi-scale grouping.

Parameters
  • in_channels (int) – Input channels of point cloud.

  • num_points (tuple[int]) – The number of points which each SA module samples.

  • radii (tuple[float]) – Sampling radii of each SA module.

  • num_samples (tuple[int]) – The number of samples for ball query in each SA module.

  • sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.

  • aggregation_channels (tuple[int]) – Out channels of aggregation multi-scale grouping features.

  • fps_mods (tuple[int]) – Mod of FPS for each SA module.

  • fps_sample_range_lists (tuple[tuple[int]]) – The number of sampling points which each SA module samples.

  • dilated_group (tuple[bool]) – Whether to use dilated ball query for

  • out_indices (Sequence[int]) – Output from which stages.

  • norm_cfg (dict) – Config of normalization layer.

  • sa_cfg (dict) –

    Config of set abstraction module, which may contain the following keys and values:

    • pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.

    • use_xyz (bool): Whether to use xyz as a part of features.

    • normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.

forward(points)[source]

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs of the last SA module.

  • sa_xyz (torch.Tensor): The coordinates of sa features.

  • sa_features (torch.Tensor): The features from the

    last Set Aggregation Layers.

  • sa_indices (torch.Tensor): Indices of the input points.

Return type

dict[str, torch.Tensor]

class mmdet3d.models.backbones.PointNet2SASSG(in_channels, num_points=(2048, 1024, 512, 256), radius=(0.2, 0.4, 0.8, 1.2), num_samples=(64, 32, 16, 16), sa_channels=((64, 64, 128), (128, 128, 256), (128, 128, 256), (128, 128, 256)), fp_channels=((256, 256), (256, 256)), norm_cfg={'type': 'BN2d'}, sa_cfg={'normalize_xyz': True, 'pool_mod': 'max', 'type': 'PointSAModule', 'use_xyz': True}, init_cfg=None)[source]

PointNet2 with Single-scale grouping.

Parameters
  • in_channels (int) – Input channels of point cloud.

  • num_points (tuple[int]) – The number of points which each SA module samples.

  • radius (tuple[float]) – Sampling radii of each SA module.

  • num_samples (tuple[int]) – The number of samples for ball query in each SA module.

  • sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.

  • fp_channels (tuple[tuple[int]]) – Out channels of each mlp in FP module.

  • norm_cfg (dict) – Config of normalization layer.

  • sa_cfg (dict) –

    Config of set abstraction module, which may contain the following keys and values:

    • pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.

    • use_xyz (bool): Whether to use xyz as a part of features.

    • normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.

forward(points)[source]

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs after SA and FP modules.

  • fp_xyz (list[torch.Tensor]): The coordinates of each fp features.

  • fp_features (list[torch.Tensor]): The features from each Feature Propagate Layers.

  • fp_indices (list[torch.Tensor]): Indices of the input points.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]

ResNeXt backbone.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Resnet stages. Default: 4.

  • groups (int) – Group of resnext.

  • base_width (int) – Base width of resnext.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer

class mmdet3d.models.backbones.ResNet(depth, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]

ResNet backbone.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • stem_channels (int | None) – Number of stem channels. If not specified, it will be the same as base_channels. Default: None.

  • base_channels (int) – Number of base channels of res layer. Default: 64.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Resnet stages. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Example

>>> from mmdet.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[source]

Forward function.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

make_stage_plugins(plugins, stage_idx)[source]

Make plugins for ResNet stage_idx th stage.

Currently we support to insert context_block, empirical_attention_block, nonlocal_block into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.

An example of plugins format could be:

Examples

>>> plugins=[
...     dict(cfg=dict(type='xxx', arg1='xxx'),
...          stages=(False, True, True, True),
...          position='after_conv2'),
...     dict(cfg=dict(type='yyy'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='1'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='2'),
...          stages=(True, True, True, True),
...          position='after_conv3')
... ]
>>> self = ResNet(depth=18)
>>> stage_plugins = self.make_stage_plugins(plugins, 0)
>>> assert len(stage_plugins) == 3

Suppose stage_idx=0, the structure of blocks in the stage would be:

conv1-> conv2->conv3->yyy->zzz1->zzz2

Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:

conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

If stages is missing, the plugin would be applied to all stages.

Parameters
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build

Returns

Plugins for current stage

Return type

list[dict]

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode while keep normalization layer freezed.

class mmdet3d.models.backbones.ResNetV1d(**kwargs)[source]

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmdet3d.models.backbones.SECOND(in_channels=128, out_channels=[128, 128, 256], layer_nums=[3, 5, 5], layer_strides=[2, 2, 2], norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, conv_cfg={'bias': False, 'type': 'Conv2d'}, init_cfg=None, pretrained=None)[source]

Backbone network for SECOND/PointPillars/PartA2/MVXNet.

Parameters
  • in_channels (int) – Input channels.

  • out_channels (list[int]) – Output channels for multi-scale feature maps.

  • layer_nums (list[int]) – Number of layers in each stage.

  • layer_strides (list[int]) – Strides of each stage.

  • norm_cfg (dict) – Config dict of normalization layers.

  • conv_cfg (dict) – Config dict of convolutional layers.

forward(x)[source]

Forward function.

Parameters

x (torch.Tensor) – Input with shape (N, C, H, W).

Returns

Multi-scale features.

Return type

tuple[torch.Tensor]

class mmdet3d.models.backbones.SSDVGG(depth, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), pretrained=None, init_cfg=None, input_size=None, l2_norm_scale=None)[source]

VGG Backbone network for single-shot-detection.

Parameters
  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.

  • with_last_pool (bool) – Whether to add a pooling layer at the last of the model

  • ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.

  • out_indices (Sequence[int]) – Output from which stages.

  • out_feature_indices (Sequence[int]) – Output from which feature map.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

  • input_size (int, optional) – Deprecated argumment. Width and height of input, from {300, 512}.

  • l2_norm_scale (float, optional) – Deprecated argumment. L2 normalization layer init scale.

Example

>>> self = SSDVGG(input_size=300, depth=11)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 300, 300)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 19, 19)
(1, 512, 10, 10)
(1, 256, 5, 5)
(1, 256, 3, 3)
(1, 256, 1, 1)
forward(x)[source]

Forward function.

init_weights(pretrained=None)[source]

Initialize the weights.

necks

class mmdet3d.models.necks.FPN(in_channels, out_channels, num_outs, start_level=0, end_level=- 1, add_extra_convs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, upsample_cfg={'mode': 'nearest'}, init_cfg={'distribution': 'uniform', 'layer': 'Conv2d', 'type': 'Xavier'})[source]

Feature Pyramid Network.

This is an implementation of paper Feature Pyramid Networks for Object Detection.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • num_outs (int) – Number of output scales.

  • start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.

  • end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.

  • add_extra_convs (bool | str) –

    If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, it is equivalent to add_extra_convs=’on_input’. If str, it specifies the source feature map of the extra convs. Only the following options are allowed

    • ’on_input’: Last feat map of neck inputs (i.e. backbone feature).

    • ’on_lateral’: Last feature map after lateral convs.

    • ’on_output’: The last output feature map after fpn convs.

  • relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.

  • no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Config dict for normalization layer. Default: None.

  • act_cfg (str) – Config dict for activation layer in ConvModule. Default: None.

  • upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(mode=’nearest’)

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

Example

>>> import torch
>>> in_channels = [2, 3, 5, 7]
>>> scales = [340, 170, 84, 43]
>>> inputs = [torch.rand(1, c, s, s)
...           for c, s in zip(in_channels, scales)]
>>> self = FPN(in_channels, 11, len(in_channels)).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 11, 340, 340])
outputs[1].shape = torch.Size([1, 11, 170, 170])
outputs[2].shape = torch.Size([1, 11, 84, 84])
outputs[3].shape = torch.Size([1, 11, 43, 43])
forward(inputs)[source]

Forward function.

class mmdet3d.models.necks.OutdoorImVoxelNeck(in_channels, out_channels)[source]

Neck for ImVoxelNet outdoor scenario.

Parameters
  • in_channels (int) – Input channels of multi-scale feature map.

  • out_channels (int) – Output channels of multi-scale feature map.

forward(x)[source]

Forward function.

Parameters

x (torch.Tensor) – of shape (N, C_in, N_x, N_y, N_z).

Returns

of shape (N, C_out, N_y, N_x).

Return type

list[torch.Tensor]

init_weights()[source]

Initialize weights of neck.

class mmdet3d.models.necks.SECONDFPN(in_channels=[128, 128, 256], out_channels=[256, 256, 256], upsample_strides=[1, 2, 4], norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, upsample_cfg={'bias': False, 'type': 'deconv'}, conv_cfg={'bias': False, 'type': 'Conv2d'}, use_conv_for_no_stride=False, init_cfg=None)[source]

FPN used in SECOND/PointPillars/PartA2/MVXNet.

Parameters
  • in_channels (list[int]) – Input channels of multi-scale feature maps.

  • out_channels (list[int]) – Output channels of feature maps.

  • upsample_strides (list[int]) – Strides used to upsample the feature maps.

  • norm_cfg (dict) – Config dict of normalization layers.

  • upsample_cfg (dict) – Config dict of upsample layers.

  • conv_cfg (dict) – Config dict of conv layers.

  • use_conv_for_no_stride (bool) – Whether to use conv when stride is 1.

forward(x)[source]

Forward function.

Parameters

x (torch.Tensor) – 4D Tensor in (N, C, H, W) shape.

Returns

Multi-level feature maps.

Return type

list[torch.Tensor]

dense_heads

class mmdet3d.models.dense_heads.Anchor3DHead(num_classes, in_channels, train_cfg, test_cfg, feat_channels=256, use_direction_classifier=True, anchor_generator={'custom_values': [], 'range': [0, - 39.68, - 1.78, 69.12, 39.68, - 1.78], 'reshape_out': False, 'rotations': [0, 1.57], 'sizes': [[1.6, 3.9, 1.56]], 'strides': [2], 'type': 'Anchor3DRangeGenerator'}, assigner_per_size=False, assign_per_class=False, diff_rad_by_sin=True, dir_offset=0, dir_limit_offset=1, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 0.2, 'type': 'CrossEntropyLoss'}, init_cfg=None)[source]

Anchor head for SECOND/PointPillars/MVXNet/PartA2.

Parameters
  • num_classes (int) – Number of classes.

  • in_channels (int) – Number of channels in the input feature map.

  • train_cfg (dict) – Train configs.

  • test_cfg (dict) – Test configs.

  • feat_channels (int) – Number of channels of the feature map.

  • use_direction_classifier (bool) – Whether to add a direction classifier.

  • anchor_generator (dict) – Config dict of anchor generator.

  • assigner_per_size (bool) – Whether to do assignment for each separate anchor size.

  • assign_per_class (bool) – Whether to do assignment for each class.

  • diff_rad_by_sin (bool) – Whether to change the difference into sin difference for box regression loss.

  • dir_offset (float | int) – The offset of BEV rotation angles. (TODO: may be moved into box coder)

  • dir_limit_offset (float | int) – The limited range of BEV rotation angles. (TODO: may be moved into box coder)

  • bbox_coder (dict) – Config dict of box coders.

  • loss_cls (dict) – Config of classification loss.

  • loss_bbox (dict) – Config of localization loss.

  • loss_dir (dict) – Config of direction classifier loss.

static add_sin_difference(boxes1, boxes2)[source]

Convert the rotation difference to difference in sine function.

Parameters
  • boxes1 (torch.Tensor) – Original Boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.

  • boxes2 (torch.Tensor) – Target boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.

Returns

boxes1 and boxes2 whose 7th dimensions are changed.

Return type

tuple[torch.Tensor]

forward(feats)[source]

Forward pass.

Parameters

feats (list[torch.Tensor]) – Multi-level features, e.g., features produced by FPN.

Returns

Multi-level class score, bbox and direction predictions.

Return type

tuple[list[torch.Tensor]]

forward_single(x)[source]

Forward function on a single-scale feature map.

Parameters

x (torch.Tensor) – Input features.

Returns

Contain score of each class, bbox regression and direction classification predictions.

Return type

tuple[torch.Tensor]

get_anchors(featmap_sizes, input_metas, device='cuda')[source]

Get anchors according to feature map sizes.

Parameters
  • featmap_sizes (list[tuple]) – Multi-level feature map sizes.

  • input_metas (list[dict]) – contain pcd and img’s meta info.

  • device (str) – device of current module.

Returns

Anchors of each image, valid flags of each image.

Return type

list[list[torch.Tensor]]

get_bboxes(cls_scores, bbox_preds, dir_cls_preds, input_metas, cfg=None, rescale=False)[source]

Get bboxes of anchor head.

Parameters
  • cls_scores (list[torch.Tensor]) – Multi-level class scores.

  • bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.

  • dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.

  • input_metas (list[dict]) – Contain pcd and img’s meta info.

  • cfg (None | ConfigDict) – Training or testing config.

  • rescale (list[torch.Tensor]) – Whether th rescale bbox.

Returns

Prediction resultes of batches.

Return type

list[tuple]

get_bboxes_single(cls_scores, bbox_preds, dir_cls_preds, mlvl_anchors, input_meta, cfg=None, rescale=False)[source]

Get bboxes of single branch.

Parameters
  • cls_scores (torch.Tensor) – Class score in single batch.

  • bbox_preds (torch.Tensor) – Bbox prediction in single batch.

  • dir_cls_preds (torch.Tensor) – Predictions of direction class in single batch.

  • mlvl_anchors (List[torch.Tensor]) – Multi-level anchors in single batch.

  • input_meta (list[dict]) – Contain pcd and img’s meta info.

  • cfg (None | ConfigDict) – Training or testing config.

  • rescale (list[torch.Tensor]) – whether th rescale bbox.

Returns

Contain predictions of single batch.

  • bboxes (BaseInstance3DBoxes): Predicted 3d bboxes.

  • scores (torch.Tensor): Class score of each bbox.

  • labels (torch.Tensor): Label of each bbox.

Return type

tuple

loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]

Calculate losses.

Parameters
  • cls_scores (list[torch.Tensor]) – Multi-level class scores.

  • bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.

  • dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.

  • gt_bboxes (list[BaseInstance3DBoxes]) – Gt bboxes of each sample.

  • gt_labels (list[torch.Tensor]) – Gt labels of each sample.

  • input_metas (list[dict]) – Contain pcd and img’s meta info.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Classification, bbox, and direction losses of each level.

  • loss_cls (list[torch.Tensor]): Classification losses.

  • loss_bbox (list[torch.Tensor]): Box regression losses.

  • loss_dir (list[torch.Tensor]): Direction classification losses.

Return type

dict[str, list[torch.Tensor]]

loss_single(cls_score, bbox_pred, dir_cls_preds, labels, label_weights, bbox_targets, bbox_weights, dir_targets, dir_weights, num_total_samples)[source]

Calculate loss of Single-level results.

Parameters
  • cls_score (torch.Tensor) – Class score in single-level.

  • bbox_pred (torch.Tensor) – Bbox prediction in single-level.

  • dir_cls_preds (torch.Tensor) – Predictions of direction class in single-level.

  • labels (torch.Tensor) – Labels of class.

  • label_weights (torch.Tensor) – Weights of class loss.

  • bbox_targets (torch.Tensor) – Targets of bbox predictions.

  • bbox_weights (torch.Tensor) – Weights of bbox loss.

  • dir_targets (torch.Tensor) – Targets of direction predictions.

  • dir_weights (torch.Tensor) – Weights of direction loss.

  • num_total_samples (int) – The number of valid samples.

Returns

Losses of class, bbox and direction, respectively.

Return type

tuple[torch.Tensor]

class mmdet3d.models.dense_heads.AnchorFreeMono3DHead(num_classes, in_channels, feat_channels=256, stacked_convs=4, strides=(4, 8, 16, 32, 64), dcn_on_last_conv=False, conv_bias='auto', background_label=None, use_direction_classifier=True, diff_rad_by_sin=True, dir_offset=0, loss_cls={'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'FocalLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, loss_attr={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, bbox_code_size=9, pred_attrs=False, num_attrs=9, pred_velo=False, pred_bbox2d=False, group_reg_dims=(2, 1, 3, 1, 2), cls_branch=(128, 64), reg_branch=((128, 64), (128, 64), (64), (64), ()), dir_branch=(64), attr_branch=(64), conv_cfg=None, norm_cfg=None, train_cfg=None, test_cfg=None, init_cfg=None)[source]

Anchor-free head for monocular 3D object detection.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • feat_channels (int) – Number of hidden channels. Used in child classes.

  • stacked_convs (int) – Number of stacking convs of the head.

  • strides (tuple) – Downsample factor of each feature map.

  • dcn_on_last_conv (bool) – If true, use dcn in the last layer of towers. Default: False.

  • conv_bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Default: “auto”.

  • background_label (int | None) – Label ID of background, set as 0 for RPN and num_classes for other heads. It will automatically set as num_classes if None is given.

  • use_direction_classifier (bool) – Whether to add a direction classifier.

  • diff_rad_by_sin (bool) – Whether to change the difference into sin difference for box regression loss.

  • loss_cls (dict) – Config of classification loss.

  • loss_bbox (dict) – Config of localization loss.

  • loss_dir (dict) – Config of direction classifier loss.

  • loss_attr (dict) – Config of attribute classifier loss, which is only active when pred_attrs=True.

  • bbox_code_size (int) – Dimensions of predicted bounding boxes.

  • pred_attrs (bool) – Whether to predict attributes. Default to False.

  • num_attrs (int) – The number of attributes to be predicted. Default: 9.

  • pred_velo (bool) – Whether to predict velocity. Default to False.

  • pred_bbox2d (bool) – Whether to predict 2D boxes. Default to False.

  • group_reg_dims (tuple[int]) – The dimension of each regression target group. Default: (2, 1, 3, 1, 2).

  • cls_branch (tuple[int]) – Channels for classification branch. Default: (128, 64).

  • reg_branch (tuple[tuple]) –

    Channels for regression branch. Default: (

    (128, 64), # offset (128, 64), # depth (64, ), # size (64, ), # rot () # velo

    ),

  • dir_branch (tuple[int]) – Channels for direction classification branch. Default: (64, ).

  • attr_branch (tuple[int]) – Channels for classification branch. Default: (64, ).

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Config dict for normalization layer. Default: None.

  • train_cfg (dict) – Training config of anchor head.

  • test_cfg (dict) – Testing config of anchor head.

forward(feats)[source]

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually contain classification scores, bbox predictions, and direction class predictions.
cls_scores (list[Tensor]): Box scores for each scale level,

each is a 4D-tensor, the channel number is num_points * num_classes.

bbox_preds (list[Tensor]): Box energies / deltas for each scale

level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.

dir_cls_preds (list[Tensor]): Box scores for direction class

predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)

attr_preds (list[Tensor]): Attribute scores for each scale

level, each is a 4D-tensor, the channel number is num_points * num_attrs.

Return type

tuple

forward_single(x)[source]

Forward features of a single scale levle.

Parameters

x (Tensor) – FPN feature maps of the specified stride.

Returns

Scores for each class, bbox predictions, direction class,

and attributes, features after classification and regression conv layers, some models needs these features like FCOS.

Return type

tuple

abstract get_bboxes(cls_scores, bbox_preds, dir_cls_preds, attr_preds, img_metas, cfg=None, rescale=None)[source]

Transform network output for a batch into bbox predictions.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_points * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_points * bbox_code_size, H, W)

  • dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)

  • attr_preds (list[Tensor]) – Attribute scores for each scale level Has shape (N, num_points * num_attrs, H, W)

  • img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used

  • rescale (bool) – If True, return boxes in original image space

get_points(featmap_sizes, dtype, device, flatten=False)[source]

Get points according to feature map sizes.

Parameters
  • featmap_sizes (list[tuple]) – Multi-level feature map sizes.

  • dtype (torch.dtype) – Type of points.

  • device (torch.device) – Device of points.

Returns

points of each image.

Return type

tuple

abstract get_targets(points, gt_bboxes_list, gt_labels_list, gt_bboxes_3d_list, gt_labels_3d_list, centers2d_list, depths_list, attr_labels_list)[source]

Compute regression, classification and centerss targets for points in multiple images.

Parameters
  • points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).

  • gt_bboxes_list (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).

  • gt_labels_list (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).

  • gt_bboxes_3d_list (list[Tensor]) – 3D Ground truth bboxes of each image, each has shape (num_gt, bbox_code_size).

  • gt_labels_3d_list (list[Tensor]) – 3D Ground truth labels of each box, each has shape (num_gt,).

  • centers2d_list (list[Tensor]) – Projected 3D centers onto 2D image, each has shape (num_gt, 2).

  • depths_list (list[Tensor]) – Depth of projected 3D centers onto 2D image, each has shape (num_gt, 1).

  • attr_labels_list (list[Tensor]) – Attribute labels of each box, each has shape (num_gt,).

init_weights()[source]

Initialize the weights.

abstract loss(cls_scores, bbox_preds, dir_cls_preds, attr_preds, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels, img_metas, gt_bboxes_ignore=None)[source]

Compute loss of the head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.

  • dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)

  • attr_preds (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_attrs.

  • gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.

  • gt_labels (list[Tensor]) – class indices corresponding to each box

  • gt_bboxes_3d (list[Tensor]) – 3D Ground truth bboxes for each image with shape (num_gts, bbox_code_size).

  • gt_labels_3d (list[Tensor]) – 3D class indices of each box.

  • centers2d (list[Tensor]) – Projected 3D centers onto 2D images.

  • depths (list[Tensor]) – Depth of projected centers on 2D images.

  • attr_labels (list[Tensor], optional) – Attribute indices corresponding to each box

  • img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.

class mmdet3d.models.dense_heads.BaseConvBboxHead(in_channels=0, shared_conv_channels=(), cls_conv_channels=(), num_cls_out_channels=0, reg_conv_channels=(), num_reg_out_channels=0, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, bias='auto', init_cfg=None, *args, **kwargs)[source]

More general bbox head, with shared conv layers and two optional separated branches.

             /-> cls convs -> cls_score
shared convs
             \-> reg convs -> bbox_pred
forward(feats)[source]

Forward.

Parameters

feats (Tensor) – Input features

Returns

Class scores predictions Tensor: Regression predictions

Return type

Tensor

class mmdet3d.models.dense_heads.BaseMono3DDenseHead(init_cfg=None)[source]

Base class for Monocular 3D DenseHeads.

forward_train(x, img_metas, gt_bboxes, gt_labels=None, gt_bboxes_3d=None, gt_labels_3d=None, centers2d=None, depths=None, attr_labels=None, gt_bboxes_ignore=None, proposal_cfg=None, **kwargs)[source]
Parameters
  • x (list[Tensor]) – Features from FPN.

  • img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • gt_bboxes (list[Tensor]) – Ground truth bboxes of the image, shape (num_gts, 4).

  • gt_labels (list[Tensor]) – Ground truth labels of each box, shape (num_gts,).

  • gt_bboxes_3d (list[Tensor]) – 3D ground truth bboxes of the image, shape (num_gts, self.bbox_code_size).

  • gt_labels_3d (list[Tensor]) – 3D ground truth labels of each box, shape (num_gts,).

  • centers2d (list[Tensor]) – Projected 3D center of each box, shape (num_gts, 2).

  • depths (list[Tensor]) – Depth of projected 3D center of each box, shape (num_gts,).

  • attr_labels (list[Tensor]) – Attribute labels of each box, shape (num_gts,).

  • gt_bboxes_ignore (list[Tensor]) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4).

  • proposal_cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used

Returns

losses: (dict[str, Tensor]): A dictionary of loss components. proposal_list (list[Tensor]): Proposals of each image.

Return type

tuple

abstract get_bboxes(**kwargs)[source]

Transform network output for a batch into bbox predictions.

abstract loss(**kwargs)[source]

Compute losses of the head.

class mmdet3d.models.dense_heads.CenterHead(in_channels=[128], tasks=None, train_cfg=None, test_cfg=None, bbox_coder=None, common_heads={}, loss_cls={'reduction': 'mean', 'type': 'GaussianFocalLoss'}, loss_bbox={'loss_weight': 0.25, 'reduction': 'none', 'type': 'L1Loss'}, separate_head={'final_kernel': 3, 'init_bias': - 2.19, 'type': 'SeparateHead'}, share_conv_channel=64, num_heatmap_convs=2, conv_cfg={'type': 'Conv2d'}, norm_cfg={'type': 'BN2d'}, bias='auto', norm_bbox=True, init_cfg=None)[source]

CenterHead for CenterPoint.

Parameters
  • mode (str) – Mode of the head. Default: ‘3d’.

  • in_channels (list[int] | int) – Channels of the input feature map. Default: [128].

  • tasks (list[dict]) – Task information including class number and class names. Default: None.

  • dataset (str) – Name of the dataset. Default: ‘nuscenes’.

  • weight (float) – Weight for location loss. Default: 0.25.

  • code_weights (list[int]) – Code weights for location loss. Default: [].

  • common_heads (dict) – Conv information for common heads. Default: dict().

  • loss_cls (dict) – Config of classification loss function. Default: dict(type=’GaussianFocalLoss’, reduction=’mean’).

  • loss_bbox (dict) – Config of regression loss function. Default: dict(type=’L1Loss’, reduction=’none’).

  • separate_head (dict) – Config of separate head. Default: dict( type=’SeparateHead’, init_bias=-2.19, final_kernel=3)

  • share_conv_channel (int) – Output channels for share_conv_layer. Default: 64.

  • num_heatmap_convs (int) – Number of conv layers for heatmap conv layer. Default: 2.

  • conv_cfg (dict) – Config of conv layer. Default: dict(type=’Conv2d’)

  • norm_cfg (dict) – Config of norm layer. Default: dict(type=’BN2d’).

  • bias (str) – Type of bias. Default: ‘auto’.

forward(feats)[source]

Forward pass.

Parameters

feats (list[torch.Tensor]) – Multi-level features, e.g., features produced by FPN.

Returns

Output results for tasks.

Return type

tuple(list[dict])

forward_single(x)[source]

Forward function for CenterPoint.

Parameters

x (torch.Tensor) – Input feature map with the shape of [B, 512, 128, 128].

Returns

Output results for tasks.

Return type

list[dict]

get_bboxes(preds_dicts, img_metas, img=None, rescale=False)[source]

Generate bboxes from bbox head predictions.

Parameters
  • preds_dicts (tuple[list[dict]]) – Prediction results.

  • img_metas (list[dict]) – Point cloud and image’s meta info.

Returns

Decoded bbox, scores and labels after nms.

Return type

list[dict]

get_targets(gt_bboxes_3d, gt_labels_3d)[source]

Generate targets.

How each output is transformed:

Each nested list is transposed so that all same-index elements in each sub-list (1, …, N) become the new sub-lists.

[ [a0, a1, a2, … ], [b0, b1, b2, … ], … ] ==> [ [a0, b0, … ], [a1, b1, … ], [a2, b2, … ] ]

The new transposed nested list is converted into a list of N tensors generated by concatenating tensors in the new sub-lists.

[ tensor0, tensor1, tensor2, … ]

Parameters
  • gt_bboxes_3d (list[LiDARInstance3DBoxes]) – Ground truth gt boxes.

  • gt_labels_3d (list[torch.Tensor]) – Labels of boxes.

Returns

tuple[list[torch.Tensor]]: Tuple of target including the following results in order.

  • list[torch.Tensor]: Heatmap scores.

  • list[torch.Tensor]: Ground truth boxes.

  • list[torch.Tensor]: Indexes indicating the position of the valid boxes.

  • list[torch.Tensor]: Masks indicating which boxes are valid.

Return type

Returns

get_targets_single(gt_bboxes_3d, gt_labels_3d)[source]

Generate training targets for a single sample.

Parameters
  • gt_bboxes_3d (LiDARInstance3DBoxes) – Ground truth gt boxes.

  • gt_labels_3d (torch.Tensor) – Labels of boxes.

Returns

Tuple of target including the following results in order.

  • list[torch.Tensor]: Heatmap scores.

  • list[torch.Tensor]: Ground truth boxes.

  • list[torch.Tensor]: Indexes indicating the position of the valid boxes.

  • list[torch.Tensor]: Masks indicating which boxes are valid.

Return type

tuple[list[torch.Tensor]]

get_task_detections(num_class_with_bg, batch_cls_preds, batch_reg_preds, batch_cls_labels, img_metas)[source]

Rotate nms for each task.

Parameters
  • num_class_with_bg (int) – Number of classes for the current task.

  • batch_cls_preds (list[torch.Tensor]) – Prediction score with the shape of [N].

  • batch_reg_preds (list[torch.Tensor]) – Prediction bbox with the shape of [N, 9].

  • batch_cls_labels (list[torch.Tensor]) – Prediction label with the shape of [N].

  • img_metas (list[dict]) – Meta information of each sample.

Returns

torch.Tensor]]: contains the following keys:

-bboxes (torch.Tensor): Prediction bboxes after nms with the shape of [N, 9]. -scores (torch.Tensor): Prediction scores after nms with the shape of [N]. -labels (torch.Tensor): Prediction labels after nms with the shape of [N].

Return type

list[dict[str

loss(gt_bboxes_3d, gt_labels_3d, preds_dicts, **kwargs)[source]

Loss function for CenterHead.

Parameters
  • gt_bboxes_3d (list[LiDARInstance3DBoxes]) – Ground truth gt boxes.

  • gt_labels_3d (list[torch.Tensor]) – Labels of boxes.

  • preds_dicts (dict) – Output of forward function.

Returns

torch.Tensor]: Loss of heatmap and bbox of each task.

Return type

dict[str

class mmdet3d.models.dense_heads.FCOSMono3DHead(num_classes, in_channels, regress_ranges=((- 1, 48), (48, 96), (96, 192), (192, 384), (384, 100000000.0)), center_sampling=True, center_sample_radius=1.5, norm_on_bbox=True, centerness_on_reg=True, centerness_alpha=2.5, loss_cls={'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'FocalLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, loss_attr={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, loss_centerness={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, norm_cfg={'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, centerness_branch=(64), init_cfg=None, **kwargs)[source]

Anchor-free head used in FCOS3D.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • regress_ranges (tuple[tuple[int, int]]) – Regress range of multiple level points.

  • center_sampling (bool) – If true, use center sampling. Default: True.

  • center_sample_radius (float) – Radius of center sampling. Default: 1.5.

  • norm_on_bbox (bool) – If true, normalize the regression targets with FPN strides. Default: True.

  • centerness_on_reg (bool) – If true, position centerness on the regress branch. Please refer to https://github.com/tianzhi0549/FCOS/issues/89#issuecomment-516877042. Default: True.

  • centerness_alpha – Parameter used to adjust the intensity attenuation from the center to the periphery. Default: 2.5.

  • loss_cls (dict) – Config of classification loss.

  • loss_bbox (dict) – Config of localization loss.

  • loss_dir (dict) – Config of direction classification loss.

  • loss_attr (dict) – Config of attribute classification loss.

  • loss_centerness (dict) – Config of centerness loss.

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: norm_cfg=dict(type=’GN’, num_groups=32, requires_grad=True).

  • centerness_branch (tuple[int]) – Channels for centerness branch. Default: (64, ).

static add_sin_difference(boxes1, boxes2)[source]

Convert the rotation difference to difference in sine function.

Parameters
  • boxes1 (torch.Tensor) – Original Boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.

  • boxes2 (torch.Tensor) – Target boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.

Returns

boxes1 and boxes2 whose 7th dimensions are changed.

Return type

tuple[torch.Tensor]

forward(feats)[source]

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

cls_scores (list[Tensor]): Box scores for each scale level,

each is a 4D-tensor, the channel number is num_points * num_classes.

bbox_preds (list[Tensor]): Box energies / deltas for each scale

level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.

dir_cls_preds (list[Tensor]): Box scores for direction class

predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2).

attr_preds (list[Tensor]): Attribute scores for each scale

level, each is a 4D-tensor, the channel number is num_points * num_attrs.

centernesses (list[Tensor]): Centerness for each scale level,

each is a 4D-tensor, the channel number is num_points * 1.

Return type

tuple

forward_single(x, scale, stride)[source]

Forward features of a single scale levle.

Parameters
  • x (Tensor) – FPN feature maps of the specified stride.

  • ( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.

  • stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.

Returns

scores for each class, bbox and direction class predictions, centerness predictions of input feature maps.

Return type

tuple

get_bboxes(cls_scores, bbox_preds, dir_cls_preds, attr_preds, centernesses, img_metas, cfg=None, rescale=None)[source]

Transform network output for a batch into bbox predictions.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_points * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_points * 4, H, W)

  • dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)

  • attr_preds (list[Tensor]) – Attribute scores for each scale level Has shape (N, num_points * num_attrs, H, W)

  • centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_points * 1, H, W)

  • img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used

  • rescale (bool) – If True, return boxes in original image space

Returns

Each item in result_list is 2-tuple. The first item is an (n, 5) tensor, where the first 4 columns are bounding box positions (tl_x, tl_y, br_x, br_y) and the 5-th column is a score between 0 and 1. The second item is a (n,) tensor where each item is the predicted class label of the corresponding box.

Return type

list[tuple[Tensor, Tensor]]

static get_direction_target(reg_targets, dir_offset=0, num_bins=2, one_hot=True)[source]

Encode direction to 0 ~ num_bins-1.

Parameters
  • reg_targets (torch.Tensor) – Bbox regression targets.

  • dir_offset (int) – Direction offset.

  • num_bins (int) – Number of bins to divide 2*PI.

  • one_hot (bool) – Whether to encode as one hot.

Returns

Encoded direction targets.

Return type

torch.Tensor

get_targets(points, gt_bboxes_list, gt_labels_list, gt_bboxes_3d_list, gt_labels_3d_list, centers2d_list, depths_list, attr_labels_list)[source]

Compute regression, classification and centerss targets for points in multiple images.

Parameters
  • points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).

  • gt_bboxes_list (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).

  • gt_labels_list (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).

  • gt_bboxes_3d_list (list[Tensor]) – 3D Ground truth bboxes of each image, each has shape (num_gt, bbox_code_size).

  • gt_labels_3d_list (list[Tensor]) – 3D Ground truth labels of each box, each has shape (num_gt,).

  • centers2d_list (list[Tensor]) – Projected 3D centers onto 2D image, each has shape (num_gt, 2).

  • depths_list (list[Tensor]) – Depth of projected 3D centers onto 2D image, each has shape (num_gt, 1).

  • attr_labels_list (list[Tensor]) – Attribute labels of each box, each has shape (num_gt,).

Returns

concat_lvl_labels (list[Tensor]): Labels of each level. concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level.

Return type

tuple

loss(cls_scores, bbox_preds, dir_cls_preds, attr_preds, centernesses, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels, img_metas, gt_bboxes_ignore=None)[source]

Compute loss of the head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.

  • dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)

  • attr_preds (list[Tensor]) – Attribute scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_attrs.

  • centernesses (list[Tensor]) – Centerness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.

  • gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.

  • gt_labels (list[Tensor]) – class indices corresponding to each box

  • gt_bboxes_3d (list[Tensor]) – 3D boxes ground truth with shape of (num_gts, code_size).

  • gt_labels_3d (list[Tensor]) – same as gt_labels

  • centers2d (list[Tensor]) – 2D centers on the image with shape of (num_gts, 2).

  • depths (list[Tensor]) – Depth ground truth with shape of (num_gts, ).

  • attr_labels (list[Tensor]) – Attributes indices of each box.

  • img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • gt_bboxes_ignore (None | list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

static pts2Dto3D(points, view)[source]
Parameters
  • points (torch.Tensor) – points in 2D images, [N, 3], 3 corresponds with x, y in the image and depth.

  • view (np.ndarray) – camera instrinsic, [3, 3]

Returns

points in 3D space. [N, 3], 3 corresponds with x, y, z in 3D space.

Return type

torch.Tensor

class mmdet3d.models.dense_heads.FreeAnchor3DHead(pre_anchor_topk=50, bbox_thr=0.6, gamma=2.0, alpha=0.5, init_cfg=None, **kwargs)[source]

FreeAnchor head for 3D detection.

Note

This implementation is directly modified from the mmdet implementation. We find it also works on 3D detection with minor modification, i.e., different hyper-parameters and a additional direction classifier.

Parameters
  • pre_anchor_topk (int) – Number of boxes that be token in each bag.

  • bbox_thr (float) – The threshold of the saturated linear function. It is usually the same with the IoU threshold used in NMS.

  • gamma (float) – Gamma parameter in focal loss.

  • alpha (float) – Alpha parameter in focal loss.

  • kwargs (dict) – Other arguments are the same as those in Anchor3DHead.

loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]

Calculate loss of FreeAnchor head.

Parameters
  • cls_scores (list[torch.Tensor]) – Classification scores of different samples.

  • bbox_preds (list[torch.Tensor]) – Box predictions of different samples

  • dir_cls_preds (list[torch.Tensor]) – Direction predictions of different samples

  • gt_bboxes (list[BaseInstance3DBoxes]) – Ground truth boxes.

  • gt_labels (list[torch.Tensor]) – Ground truth labels.

  • input_metas (list[dict]) – List of input meta information.

  • gt_bboxes_ignore (list[BaseInstance3DBoxes], optional) – Ground truth boxes that should be ignored. Defaults to None.

Returns

Loss items.

  • positive_bag_loss (torch.Tensor): Loss of positive samples.

  • negative_bag_loss (torch.Tensor): Loss of negative samples.

Return type

dict[str, torch.Tensor]

negative_bag_loss(cls_prob, box_prob)[source]

Generate negative bag loss.

Parameters
  • cls_prob (torch.Tensor) – Classification probability of negative samples.

  • box_prob (torch.Tensor) – Bounding box probability of negative samples.

Returns

Loss of negative samples.

Return type

torch.Tensor

positive_bag_loss(matched_cls_prob, matched_box_prob)[source]

Generate positive bag loss.

Parameters
  • matched_cls_prob (torch.Tensor) – Classification probability of matched positive samples.

  • matched_box_prob (torch.Tensor) – Bounding box probability of matched positive samples.

Returns

Loss of positive samples.

Return type

torch.Tensor

class mmdet3d.models.dense_heads.GroupFree3DHead(num_classes, in_channels, bbox_coder, num_decoder_layers, transformerlayers, decoder_self_posembeds={'input_channel': 6, 'num_pos_feats': 288, 'type': 'ConvBNPositionalEncoding'}, decoder_cross_posembeds={'input_channel': 3, 'num_pos_feats': 288, 'type': 'ConvBNPositionalEncoding'}, train_cfg=None, test_cfg=None, num_proposal=128, pred_layer_cfg=None, size_cls_agnostic=True, gt_per_seed=3, sampling_objectness_loss=None, objectness_loss=None, center_loss=None, dir_class_loss=None, dir_res_loss=None, size_class_loss=None, size_res_loss=None, size_reg_loss=None, semantic_loss=None, init_cfg=None)[source]

Bbox head of Group-Free 3D.

Parameters
  • num_classes (int) – The number of class.

  • in_channels (int) – The dims of input features from backbone.

  • bbox_coder (BaseBBoxCoder) – Bbox coder for encoding and decoding boxes.

  • num_decoder_layers (int) – The number of transformer decoder layers.

  • transformerlayers (dict) – Config for transformer decoder.

  • train_cfg (dict) – Config for training.

  • test_cfg (dict) – Config for testing.

  • num_proposal (int) – The number of initial sampling candidates.

  • pred_layer_cfg (dict) – Config of classfication and regression prediction layers.

  • size_cls_agnostic (bool) – Whether the predicted size is class-agnostic.

  • gt_per_seed (int) – the number of candidate instance each point belongs to.

  • sampling_objectness_loss (dict) – Config of initial sampling objectness loss.

  • objectness_loss (dict) – Config of objectness loss.

  • center_loss (dict) – Config of center loss.

  • dir_class_loss (dict) – Config of direction classification loss.

  • dir_res_loss (dict) – Config of direction residual regression loss.

  • size_class_loss (dict) – Config of size classification loss.

  • size_res_loss (dict) – Config of size residual regression loss.

  • size_reg_loss (dict) – Config of class-agnostic size regression loss.

  • semantic_loss (dict) – Config of point-wise semantic segmentation loss.

forward(feat_dict, sample_mod)[source]

Forward pass.

Note

The forward of GroupFree3DHead is devided into 2 steps:

  1. Initial object candidates sampling.

  2. Iterative object box prediction by transformer decoder.

Parameters
  • feat_dict (dict) – Feature dict from backbone.

  • sample_mod (str) – sample mode for initial candidates sampling.

Returns

Predictions of GroupFree3D head.

Return type

results (dict)

get_bboxes(points, bbox_preds, input_metas, rescale=False, use_nms=True)[source]

Generate bboxes from GroupFree3D head predictions.

Parameters
  • points (torch.Tensor) – Input points.

  • bbox_preds (dict) – Predictions from GroupFree3D head.

  • input_metas (list[dict]) – Point cloud and image’s meta info.

  • rescale (bool) – Whether to rescale bboxes.

  • use_nms (bool) – Whether to apply NMS, skip nms postprocessing while using GroupFree3D head in rpn stage.

Returns

Bounding boxes, scores and labels.

Return type

list[tuple[torch.Tensor]]

get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None, max_gt_num=64)[source]

Generate targets of GroupFree3D head.

Parameters
  • points (list[torch.Tensor]) – Points of each batch.

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each batch.

  • gt_labels_3d (list[torch.Tensor]) – Labels of each batch.

  • pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic label of each batch.

  • pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance label of each batch.

  • bbox_preds (torch.Tensor) – Bounding box predictions of vote head.

  • max_gt_num (int) – Max number of GTs for single batch.

Returns

Targets of GroupFree3D head.

Return type

tuple[torch.Tensor]

get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, max_gt_nums=None, seed_points=None, seed_indices=None, candidate_indices=None, seed_points_obj_topk=4)[source]

Generate targets of GroupFree3D head for single batch.

Parameters
  • points (torch.Tensor) – Points of each batch.

  • gt_bboxes_3d (BaseInstance3DBoxes) – Ground truth boxes of each batch.

  • gt_labels_3d (torch.Tensor) – Labels of each batch.

  • pts_semantic_mask (None | torch.Tensor) – Point-wise semantic label of each batch.

  • pts_instance_mask (None | torch.Tensor) – Point-wise instance label of each batch.

  • max_gt_nums (int) – Max number of GTs for single batch.

  • seed_points (torch.Tensor) – Coordinates of seed points.

  • seed_indices (torch.Tensor) – Indices of seed points.

  • candidate_indices (torch.Tensor) – Indices of object candidates.

  • seed_points_obj_topk (int) – k value of k-Closest Points Sampling.

Returns

Targets of GroupFree3D head.

Return type

tuple[torch.Tensor]

init_weights()[source]

Initialize weights of transformer decoder in GroupFree3DHead.

loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, gt_bboxes_ignore=None, ret_target=False)[source]

Compute loss.

Parameters
  • bbox_preds (dict) – Predictions from forward of vote head.

  • points (list[torch.Tensor]) – Input points.

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each sample.

  • gt_labels_3d (list[torch.Tensor]) – Labels of each sample.

  • pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic mask.

  • pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance mask.

  • img_metas (list[dict]) – Contain pcd and img’s meta info.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

  • ret_target (Bool) – Return targets or not.

Returns

Losses of GroupFree3D.

Return type

dict

multiclass_nms_single(obj_scores, sem_scores, bbox, points, input_meta)[source]

Multi-class nms in single batch.

Parameters
  • obj_scores (torch.Tensor) – Objectness score of bounding boxes.

  • sem_scores (torch.Tensor) – semantic class score of bounding boxes.

  • bbox (torch.Tensor) – Predicted bounding boxes.

  • points (torch.Tensor) – Input points.

  • input_meta (dict) – Point cloud and image’s meta info.

Returns

Bounding boxes, scores and labels.

Return type

tuple[torch.Tensor]

class mmdet3d.models.dense_heads.PartA2RPNHead(num_classes, in_channels, train_cfg, test_cfg, feat_channels=256, use_direction_classifier=True, anchor_generator={'custom_values': [], 'range': [0, - 39.68, - 1.78, 69.12, 39.68, - 1.78], 'reshape_out': False, 'rotations': [0, 1.57], 'sizes': [[1.6, 3.9, 1.56]], 'strides': [2], 'type': 'Anchor3DRangeGenerator'}, assigner_per_size=False, assign_per_class=False, diff_rad_by_sin=True, dir_offset=0, dir_limit_offset=1, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 0.2, 'type': 'CrossEntropyLoss'}, init_cfg=None)[source]

RPN head for PartA2.

Note

The main difference between the PartA2 RPN head and the Anchor3DHead lies in their output during inference. PartA2 RPN head further returns the original classification score for the second stage since the bbox head in RoI head does not do classification task.

Different from RPN heads in 2D detectors, this RPN head does multi-class classification task and uses FocalLoss like the SECOND and PointPillars do. But this head uses class agnostic nms rather than multi-class nms.

Parameters
  • num_classes (int) – Number of classes.

  • in_channels (int) – Number of channels in the input feature map.

  • train_cfg (dict) – Train configs.

  • test_cfg (dict) – Test configs.

  • feat_channels (int) – Number of channels of the feature map.

  • use_direction_classifier (bool) – Whether to add a direction classifier.

  • anchor_generator (dict) – Config dict of anchor generator.

  • assigner_per_size (bool) – Whether to do assignment for each separate anchor size.

  • assign_per_class (bool) – Whether to do assignment for each class.

  • diff_rad_by_sin (bool) – Whether to change the difference into sin difference for box regression loss.

  • dir_offset (float | int) – The offset of BEV rotation angles (TODO: may be moved into box coder)

  • dir_limit_offset (float | int) – The limited range of BEV rotation angles. (TODO: may be moved into box coder)

  • bbox_coder (dict) – Config dict of box coders.

  • loss_cls (dict) – Config of classification loss.

  • loss_bbox (dict) – Config of localization loss.

  • loss_dir (dict) – Config of direction classifier loss.

class_agnostic_nms(mlvl_bboxes, mlvl_bboxes_for_nms, mlvl_max_scores, mlvl_label_pred, mlvl_cls_score, mlvl_dir_scores, score_thr, max_num, cfg, input_meta)[source]

Class agnostic nms for single batch.

Parameters
  • mlvl_bboxes (torch.Tensor) – Bboxes from Multi-level.

  • mlvl_bboxes_for_nms (torch.Tensor) – Bboxes for nms (bev or minmax boxes) from Multi-level.

  • mlvl_max_scores (torch.Tensor) – Max scores of Multi-level bbox.

  • mlvl_label_pred (torch.Tensor) – Class predictions of Multi-level bbox.

  • mlvl_cls_score (torch.Tensor) – Class scores of Multi-level bbox.

  • mlvl_dir_scores (torch.Tensor) – Direction scores of Multi-level bbox.

  • score_thr (int) – Score threshold.

  • max_num (int) – Max number of bboxes after nms.

  • cfg (None | ConfigDict) – Training or testing config.

  • input_meta (dict) – Contain pcd and img’s meta info.

Returns

Predictions of single batch. Contain the keys:

  • boxes_3d (BaseInstance3DBoxes): Predicted 3d bboxes.

  • scores_3d (torch.Tensor): Score of each bbox.

  • labels_3d (torch.Tensor): Label of each bbox.

  • cls_preds (torch.Tensor): Class score of each bbox.

Return type

dict

get_bboxes_single(cls_scores, bbox_preds, dir_cls_preds, mlvl_anchors, input_meta, cfg, rescale=False)[source]

Get bboxes of single branch.

Parameters
  • cls_scores (torch.Tensor) – Class score in single batch.

  • bbox_preds (torch.Tensor) – Bbox prediction in single batch.

  • dir_cls_preds (torch.Tensor) – Predictions of direction class in single batch.

  • mlvl_anchors (List[torch.Tensor]) – Multi-level anchors in single batch.

  • input_meta (list[dict]) – Contain pcd and img’s meta info.

  • cfg (None | ConfigDict) – Training or testing config.

  • rescale (list[torch.Tensor]) – whether th rescale bbox.

Returns

Predictions of single batch containing the following keys:

  • boxes_3d (BaseInstance3DBoxes): Predicted 3d bboxes.

  • scores_3d (torch.Tensor): Score of each bbox.

  • labels_3d (torch.Tensor): Label of each bbox.

  • cls_preds (torch.Tensor): Class score of each bbox.

Return type

dict

loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]

Calculate losses.

Parameters
  • cls_scores (list[torch.Tensor]) – Multi-level class scores.

  • bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.

  • dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.

  • gt_bboxes (list[BaseInstance3DBoxes]) – Ground truth boxes of each sample.

  • gt_labels (list[torch.Tensor]) – Labels of each sample.

  • input_metas (list[dict]) – Point cloud and image’s meta info.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Classification, bbox, and direction losses of each level.

  • loss_rpn_cls (list[torch.Tensor]): Classification losses.

  • loss_rpn_bbox (list[torch.Tensor]): Box regression losses.

  • loss_rpn_dir (list[torch.Tensor]): Direction classification losses.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.dense_heads.SSD3DHead(num_classes, bbox_coder, in_channels=256, train_cfg=None, test_cfg=None, vote_module_cfg=None, vote_aggregation_cfg=None, pred_layer_cfg=None, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, objectness_loss=None, center_loss=None, dir_class_loss=None, dir_res_loss=None, size_res_loss=None, corner_loss=None, vote_loss=None, init_cfg=None)[source]

Bbox head of 3DSSD.

Parameters
  • num_classes (int) – The number of class.

  • bbox_coder (BaseBBoxCoder) – Bbox coder for encoding and decoding boxes.

  • in_channels (int) – The number of input feature channel.

  • train_cfg (dict) – Config for training.

  • test_cfg (dict) – Config for testing.

  • vote_module_cfg (dict) – Config of VoteModule for point-wise votes.

  • vote_aggregation_cfg (dict) – Config of vote aggregation layer.

  • pred_layer_cfg (dict) – Config of classfication and regression prediction layers.

  • conv_cfg (dict) – Config of convolution in prediction layer.

  • norm_cfg (dict) – Config of BN in prediction layer.

  • act_cfg (dict) – Config of activation in prediction layer.

  • objectness_loss (dict) – Config of objectness loss.

  • center_loss (dict) – Config of center loss.

  • dir_class_loss (dict) – Config of direction classification loss.

  • dir_res_loss (dict) – Config of direction residual regression loss.

  • size_res_loss (dict) – Config of size residual regression loss.

  • corner_loss (dict) – Config of bbox corners regression loss.

  • vote_loss (dict) – Config of candidate points regression loss.

get_bboxes(points, bbox_preds, input_metas, rescale=False)[source]

Generate bboxes from sdd3d head predictions.

Parameters
  • points (torch.Tensor) – Input points.

  • bbox_preds (dict) – Predictions from sdd3d head.

  • input_metas (list[dict]) – Point cloud and image’s meta info.

  • rescale (bool) – Whether to rescale bboxes.

Returns

Bounding boxes, scores and labels.

Return type

list[tuple[torch.Tensor]]

get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None)[source]

Generate targets of ssd3d head.

Parameters
  • points (list[torch.Tensor]) – Points of each batch.

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each batch.

  • gt_labels_3d (list[torch.Tensor]) – Labels of each batch.

  • pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic label of each batch.

  • pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance label of each batch.

  • bbox_preds (torch.Tensor) – Bounding box predictions of ssd3d head.

Returns

Targets of ssd3d head.

Return type

tuple[torch.Tensor]

get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, aggregated_points=None, seed_points=None)[source]

Generate targets of ssd3d head for single batch.

Parameters
  • points (torch.Tensor) – Points of each batch.

  • gt_bboxes_3d (BaseInstance3DBoxes) – Ground truth boxes of each batch.

  • gt_labels_3d (torch.Tensor) – Labels of each batch.

  • pts_semantic_mask (None | torch.Tensor) – Point-wise semantic label of each batch.

  • pts_instance_mask (None | torch.Tensor) – Point-wise instance label of each batch.

  • aggregated_points (torch.Tensor) – Aggregated points from candidate points layer.

  • seed_points (torch.Tensor) – Seed points of candidate points.

Returns

Targets of ssd3d head.

Return type

tuple[torch.Tensor]

loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, gt_bboxes_ignore=None)[source]

Compute loss.

Parameters
  • bbox_preds (dict) – Predictions from forward of SSD3DHead.

  • points (list[torch.Tensor]) – Input points.

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each sample.

  • gt_labels_3d (list[torch.Tensor]) – Labels of each sample.

  • pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic mask.

  • pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance mask.

  • img_metas (list[dict]) – Contain pcd and img’s meta info.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Losses of 3DSSD.

Return type

dict

multiclass_nms_single(obj_scores, sem_scores, bbox, points, input_meta)[source]

Multi-class nms in single batch.

Parameters
  • obj_scores (torch.Tensor) – Objectness score of bounding boxes.

  • sem_scores (torch.Tensor) – semantic class score of bounding boxes.

  • bbox (torch.Tensor) – Predicted bounding boxes.

  • points (torch.Tensor) – Input points.

  • input_meta (dict) – Point cloud and image’s meta info.

Returns

Bounding boxes, scores and labels.

Return type

tuple[torch.Tensor]

class mmdet3d.models.dense_heads.ShapeAwareHead(tasks, assign_per_class=True, init_cfg=None, **kwargs)[source]

Shape-aware grouping head for SSN.

Parameters
  • tasks (dict) – Shape-aware groups of multi-class objects.

  • assign_per_class (bool, optional) – Whether to do assignment for each class. Default: True.

  • kwargs (dict) – Other arguments are the same as those in Anchor3DHead.

forward_single(x)[source]

Forward function on a single-scale feature map.

Parameters

x (torch.Tensor) – Input features.

Returns

Contain score of each class, bbox regression and direction classification predictions.

Return type

tuple[torch.Tensor]

get_bboxes(cls_scores, bbox_preds, dir_cls_preds, input_metas, cfg=None, rescale=False)[source]

Get bboxes of anchor head.

Parameters
  • cls_scores (list[torch.Tensor]) – Multi-level class scores.

  • bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.

  • dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.

  • input_metas (list[dict]) – Contain pcd and img’s meta info.

  • cfg (None | ConfigDict) – Training or testing config. Default: None.

  • rescale (list[torch.Tensor], optional) – Whether to rescale bbox. Default: False.

Returns

Prediction resultes of batches.

Return type

list[tuple]

get_bboxes_single(cls_scores, bbox_preds, dir_cls_preds, mlvl_anchors, input_meta, cfg=None, rescale=False)[source]

Get bboxes of single branch.

Parameters
  • cls_scores (torch.Tensor) – Class score in single batch.

  • bbox_preds (torch.Tensor) – Bbox prediction in single batch.

  • dir_cls_preds (torch.Tensor) – Predictions of direction class in single batch.

  • mlvl_anchors (List[torch.Tensor]) – Multi-level anchors in single batch.

  • input_meta (list[dict]) – Contain pcd and img’s meta info.

  • cfg (None | ConfigDict) – Training or testing config.

  • rescale (list[torch.Tensor], optional) – whether to rescale bbox. Default: False.

Returns

Contain predictions of single batch.

  • bboxes (BaseInstance3DBoxes): Predicted 3d bboxes.

  • scores (torch.Tensor): Class score of each bbox.

  • labels (torch.Tensor): Label of each bbox.

Return type

tuple

init_weights()[source]

Initialize the weights.

loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]

Calculate losses.

Parameters
  • cls_scores (list[torch.Tensor]) – Multi-level class scores.

  • bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.

  • dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.

  • gt_bboxes (list[BaseInstance3DBoxes]) – Gt bboxes of each sample.

  • gt_labels (list[torch.Tensor]) – Gt labels of each sample.

  • input_metas (list[dict]) – Contain pcd and img’s meta info.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Classification, bbox, and direction losses of each level.

  • loss_cls (list[torch.Tensor]): Classification losses.

  • loss_bbox (list[torch.Tensor]): Box regression losses.

  • loss_dir (list[torch.Tensor]): Direction classification losses.

Return type

dict[str, list[torch.Tensor]]

loss_single(cls_score, bbox_pred, dir_cls_preds, labels, label_weights, bbox_targets, bbox_weights, dir_targets, dir_weights, num_total_samples)[source]

Calculate loss of Single-level results.

Parameters
  • cls_score (torch.Tensor) – Class score in single-level.

  • bbox_pred (torch.Tensor) – Bbox prediction in single-level.

  • dir_cls_preds (torch.Tensor) – Predictions of direction class in single-level.

  • labels (torch.Tensor) – Labels of class.

  • label_weights (torch.Tensor) – Weights of class loss.

  • bbox_targets (torch.Tensor) – Targets of bbox predictions.

  • bbox_weights (torch.Tensor) – Weights of bbox loss.

  • dir_targets (torch.Tensor) – Targets of direction predictions.

  • dir_weights (torch.Tensor) – Weights of direction loss.

  • num_total_samples (int) – The number of valid samples.

Returns

Losses of class, bbox and direction, respectively.

Return type

tuple[torch.Tensor]

class mmdet3d.models.dense_heads.VoteHead(num_classes, bbox_coder, train_cfg=None, test_cfg=None, vote_module_cfg=None, vote_aggregation_cfg=None, pred_layer_cfg=None, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, objectness_loss=None, center_loss=None, dir_class_loss=None, dir_res_loss=None, size_class_loss=None, size_res_loss=None, semantic_loss=None, iou_loss=None, init_cfg=None)[source]

Bbox head of Votenet.

Parameters
  • num_classes (int) – The number of class.

  • bbox_coder (BaseBBoxCoder) – Bbox coder for encoding and decoding boxes.

  • train_cfg (dict) – Config for training.

  • test_cfg (dict) – Config for testing.

  • vote_module_cfg (dict) – Config of VoteModule for point-wise votes.

  • vote_aggregation_cfg (dict) – Config of vote aggregation layer.

  • pred_layer_cfg (dict) – Config of classfication and regression prediction layers.

  • conv_cfg (dict) – Config of convolution in prediction layer.

  • norm_cfg (dict) – Config of BN in prediction layer.

  • objectness_loss (dict) – Config of objectness loss.

  • center_loss (dict) – Config of center loss.

  • dir_class_loss (dict) – Config of direction classification loss.

  • dir_res_loss (dict) – Config of direction residual regression loss.

  • size_class_loss (dict) – Config of size classification loss.

  • size_res_loss (dict) – Config of size residual regression loss.

  • semantic_loss (dict) – Config of point-wise semantic segmentation loss.

forward(feat_dict, sample_mod)[source]

Forward pass.

Note

The forward of VoteHead is devided into 4 steps:

  1. Generate vote_points from seed_points.

  2. Aggregate vote_points.

  3. Predict bbox and score.

  4. Decode predictions.

Parameters
  • feat_dict (dict) – Feature dict from backbone.

  • sample_mod (str) – Sample mode for vote aggregation layer. valid modes are “vote”, “seed”, “random” and “spec”.

Returns

Predictions of vote head.

Return type

dict

get_bboxes(points, bbox_preds, input_metas, rescale=False, use_nms=True)[source]

Generate bboxes from vote head predictions.

Parameters
  • points (torch.Tensor) – Input points.

  • bbox_preds (dict) – Predictions from vote head.

  • input_metas (list[dict]) – Point cloud and image’s meta info.

  • rescale (bool) – Whether to rescale bboxes.

  • use_nms (bool) – Whether to apply NMS, skip nms postprocessing while using vote head in rpn stage.

Returns

Bounding boxes, scores and labels.

Return type

list[tuple[torch.Tensor]]

get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None)[source]

Generate targets of vote head.

Parameters
  • points (list[torch.Tensor]) – Points of each batch.

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each batch.

  • gt_labels_3d (list[torch.Tensor]) – Labels of each batch.

  • pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic label of each batch.

  • pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance label of each batch.

  • bbox_preds (torch.Tensor) – Bounding box predictions of vote head.

Returns

Targets of vote head.

Return type

tuple[torch.Tensor]

get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, aggregated_points=None)[source]

Generate targets of vote head for single batch.

Parameters
  • points (torch.Tensor) – Points of each batch.

  • gt_bboxes_3d (BaseInstance3DBoxes) – Ground truth boxes of each batch.

  • gt_labels_3d (torch.Tensor) – Labels of each batch.

  • pts_semantic_mask (None | torch.Tensor) – Point-wise semantic label of each batch.

  • pts_instance_mask (None | torch.Tensor) – Point-wise instance label of each batch.

  • aggregated_points (torch.Tensor) – Aggregated points from vote aggregation layer.

Returns

Targets of vote head.

Return type

tuple[torch.Tensor]

loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, gt_bboxes_ignore=None, ret_target=False)[source]

Compute loss.

Parameters
  • bbox_preds (dict) – Predictions from forward of vote head.

  • points (list[torch.Tensor]) – Input points.

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each sample.

  • gt_labels_3d (list[torch.Tensor]) – Labels of each sample.

  • pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic mask.

  • pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance mask.

  • img_metas (list[dict]) – Contain pcd and img’s meta info.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

  • ret_target (Bool) – Return targets or not.

Returns

Losses of Votenet.

Return type

dict

multiclass_nms_single(obj_scores, sem_scores, bbox, points, input_meta)[source]

Multi-class nms in single batch.

Parameters
  • obj_scores (torch.Tensor) – Objectness score of bounding boxes.

  • sem_scores (torch.Tensor) – semantic class score of bounding boxes.

  • bbox (torch.Tensor) – Predicted bounding boxes.

  • points (torch.Tensor) – Input points.

  • input_meta (dict) – Point cloud and image’s meta info.

Returns

Bounding boxes, scores and labels.

Return type

tuple[torch.Tensor]

roi_heads

class mmdet3d.models.roi_heads.Base3DRoIHead(bbox_head=None, mask_roi_extractor=None, mask_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

Base class for 3d RoIHeads.

aug_test(x, proposal_list, img_metas, rescale=False, **kwargs)[source]

Test with augmentations.

If rescale is False, then returned bboxes and masks will fit the scale of imgs[0].

abstract forward_train(x, img_metas, proposal_list, gt_bboxes, gt_labels, gt_bboxes_ignore=None, **kwargs)[source]

Forward function during training.

Parameters
  • x (dict) – Contains features from the first stage.

  • img_metas (list[dict]) – Meta info of each image.

  • proposal_list (list[dict]) – Proposal information from rpn.

  • gt_bboxes (list[BaseInstance3DBoxes]) – GT bboxes of each sample. The bboxes are encapsulated by 3D box structures.

  • gt_labels (list[torch.LongTensor]) – GT labels of each sample.

  • gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored.

Returns

Losses from each head.

Return type

dict[str, torch.Tensor]

abstract init_assigner_sampler()[source]

Initialize assigner and sampler.

abstract init_bbox_head()[source]

Initialize the box head.

abstract init_mask_head()[source]

Initialize maek head.

simple_test(x, proposal_list, img_metas, proposals=None, rescale=False, **kwargs)[source]

Test without augmentation.

property with_bbox

whether the RoIHead has box head

Type

bool

property with_mask

whether the RoIHead has mask head

Type

bool

class mmdet3d.models.roi_heads.H3DRoIHead(primitive_list, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]

H3D roi head for H3DNet.

Parameters
  • primitive_list (List) – Configs of primitive heads.

  • bbox_head (ConfigDict) – Config of bbox_head.

  • train_cfg (ConfigDict) – Training config.

  • test_cfg (ConfigDict) – Testing config.

forward_train(feats_dict, img_metas, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask, pts_instance_mask, gt_bboxes_ignore=None)[source]

Training forward function of PartAggregationROIHead.

Parameters
  • feats_dict (dict) – Contains features from the first stage.

  • img_metas (list[dict]) – Contain pcd and img’s meta info.

  • points (list[torch.Tensor]) – Input points.

  • gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each sample.

  • gt_labels_3d (list[torch.Tensor]) – Labels of each sample.

  • pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic mask.

  • pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance mask.

  • gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

losses from each head.

Return type

dict

init_assigner_sampler()[source]

Initialize assigner and sampler.

init_bbox_head(bbox_head)[source]

Initialize box head.

init_mask_head()[source]

Initialize mask head, skip since H3DROIHead does not have one.

simple_test(feats_dict, img_metas, points, rescale=False)[source]

Simple testing forward function of PartAggregationROIHead.

Note

This function assumes that the batch size is 1

Parameters
  • feats_dict (dict) – Contains features from the first stage.

  • img_metas (list[dict]) – Contain pcd and img’s meta info.

  • points (torch.Tensor) – Input points.

  • rescale (bool) – Whether to rescale results.

Returns

Bbox results of one frame.

Return type

dict

class mmdet3d.models.roi_heads.PartA2BboxHead(num_classes, seg_in_channels, part_in_channels, seg_conv_channels=None, part_conv_channels=None, merge_conv_channels=None, down_conv_channels=None, shared_fc_channels=None, cls_channels=None, reg_channels=None, dropout_ratio=0.1, roi_feat_size=14, with_corner_loss=True, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, conv_cfg={'type': 'Conv1d'}, norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN1d'}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_cls={'loss_weight': 1.0, 'reduction': 'none', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg=None)[source]

PartA2 RoI head.

Parameters
  • num_classes (int) – The number of classes to prediction.

  • seg_in_channels (int) – Input channels of segmentation convolution layer.

  • part_in_channels (int) – Input channels of part convolution layer.

  • seg_conv_channels (list(int)) – Out channels of each segmentation convolution layer.

  • part_conv_channels (list(int)) – Out channels of each part convolution layer.

  • merge_conv_channels (list(int)) – Out channels of each feature merged convolution layer.

  • down_conv_channels (list(int)) – Out channels of each downsampled convolution layer.

  • shared_fc_channels (list(int)) – Out channels of each shared fc layer.

  • cls_channels (list(int)) – Out channels of each classification layer.

  • reg_channels (list(int)) – Out channels of each regression layer.

  • dropout_ratio (float) – Dropout ratio of classification and regression layers.

  • roi_feat_size (int) – The size of pooled roi features.

  • with_corner_loss (bool) – Whether to use corner loss or not.

  • bbox_coder (BaseBBoxCoder) – Bbox coder for box head.

  • conv_cfg (dict) – Config dict of convolutional layers

  • norm_cfg (dict) – Config dict of normalization layers

  • loss_bbox (dict) – Config dict of box regression loss.

  • loss_cls (dict) – Config dict of classifacation loss.

forward(seg_feats, part_feats)[source]

Forward pass.

Parameters
  • seg_feats (torch.Tensor) – Point-wise semantic features.

  • part_feats (torch.Tensor) – Point-wise part prediction features.

Returns

Score of class and bbox predictions.

Return type

tuple[torch.Tensor]

get_bboxes(rois, cls_score, bbox_pred, class_labels, class_pred, img_metas, cfg=None)[source]

Generate bboxes from bbox head predictions.

Parameters
  • rois (torch.Tensor) – Roi bounding boxes.

  • cls_score (torch.Tensor) – Scores of bounding boxes.

  • bbox_pred (torch.Tensor) – Bounding boxes predictions

  • class_labels (torch.Tensor) – Label of classes

  • class_pred (torch.Tensor) – Score for nms.

  • img_metas (list[dict]) – Point cloud and image’s meta info.

  • cfg (ConfigDict) – Testing config.

Returns

Decoded bbox, scores and labels after nms.

Return type

list[tuple]

get_corner_loss_lidar(pred_bbox3d, gt_bbox3d, delta=1)[source]

Calculate corner loss of given boxes.

Parameters
  • pred_bbox3d (torch.FloatTensor) – Predicted boxes in shape (N, 7).

  • gt_bbox3d (torch.FloatTensor) – Ground truth boxes in shape (N, 7).

Returns

Calculated corner loss in shape (N).

Return type

torch.FloatTensor

get_targets(sampling_results, rcnn_train_cfg, concat=True)[source]

Generate targets.

Parameters
  • sampling_results (list[SamplingResult]) – Sampled results from rois.

  • rcnn_train_cfg (ConfigDict) – Training config of rcnn.

  • concat (bool) – Whether to concatenate targets between batches.

Returns

Targets of boxes and class prediction.

Return type

tuple[torch.Tensor]

init_weights()[source]

Initialize the weights.

loss(cls_score, bbox_pred, rois, labels, bbox_targets, pos_gt_bboxes, reg_mask, label_weights, bbox_weights)[source]

Coumputing losses.

Parameters
  • cls_score (torch.Tensor) – Scores of each roi.

  • bbox_pred (torch.Tensor) – Predictions of bboxes.

  • rois (torch.Tensor) – Roi bboxes.

  • labels (torch.Tensor) – Labels of class.

  • bbox_targets (torch.Tensor) – Target of positive bboxes.

  • pos_gt_bboxes (torch.Tensor) – Ground truths of positive bboxes.

  • reg_mask (torch.Tensor) – Mask for positive bboxes.

  • label_weights (torch.Tensor) – Weights of class loss.

  • bbox_weights (torch.Tensor) – Weights of bbox loss.

Returns

Computed losses.

  • loss_cls (torch.Tensor): Loss of classes.

  • loss_bbox (torch.Tensor): Loss of bboxes.

  • loss_corner (torch.Tensor): Loss of corners.

Return type

dict

multi_class_nms(box_probs, box_preds, score_thr, nms_thr, input_meta, use_rotate_nms=True)[source]

Multi-class NMS for box head.

Note

This function has large overlap with the box3d_multiclass_nms implemented in mmdet3d.core.post_processing. We are considering merging these two functions in the future.

Parameters
  • box_probs (torch.Tensor) – Predicted boxes probabitilies in shape (N,).

  • box_preds (torch.Tensor) – Predicted boxes in shape (N, 7+C).

  • score_thr (float) – Threshold of scores.

  • nms_thr (float) – Threshold for NMS.

  • input_meta (dict) – Meta informations of the current sample.

  • use_rotate_nms (bool, optional) – Whether to use rotated nms. Defaults to True.

Returns

Selected indices.

Return type

torch.Tensor

class mmdet3d.models.roi_heads.PartAggregationROIHead(semantic_head, num_classes=3, seg_roi_extractor=None, part_roi_extractor=None, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)