mmdet3d.core¶
anchor¶
- class mmdet3d.core.anchor.AlignedAnchor3DRangeGenerator(align_corner=False, **kwargs)[source]¶
Aligned 3D Anchor Generator by range.
This anchor generator uses a different manner to generate the positions of anchors’ centers from
Anchor3DRangeGenerator
.Note
The align means that the anchor’s center is aligned with the voxel grid, which is also the feature grid. The previous implementation of
Anchor3DRangeGenerator
does not generate the anchors’ center according to the voxel grid. Rather, it generates the center by uniformly distributing the anchors inside the minimum and maximum anchor ranges according to the feature map sizes. However, this makes the anchors center does not match the feature grid. TheAlignedAnchor3DRangeGenerator
add + 1 when using the feature map sizes to obtain the corners of the voxel grid. Then it shifts the coordinates to the center of voxel grid and use the left up corner to distribute anchors.- Parameters
anchor_corner (bool, optional) – Whether to align with the corner of the voxel grid. By default it is False and the anchor’s center will be the same as the corresponding voxel’s center, which is also the center of the corresponding greature grid. Defaults to False.
- anchors_single_range(feature_size, anchor_range, scale, sizes=[[3.9, 1.6, 1.56]], rotations=[0, 1.5707963], device='cuda')[source]¶
Generate anchors in a single range.
- Parameters
feature_size (list[float] | tuple[float]) – Feature map size. It is either a list of a tuple of [D, H, W](in order of z, y, and x).
anchor_range (torch.Tensor | list[float]) – Range of anchors with shape [6]. The order is consistent with that of anchors, i.e., (x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int) – The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor, optional) – Anchor size with shape [N, 3], in order of x, y, z. Defaults to [[3.9, 1.6, 1.56]].
rotations (list[float] | np.ndarray | torch.Tensor, optional) – Rotations of anchors in a single feature grid. Defaults to [0, 1.5707963].
device (str, optional) – Devices that the anchors will be put on. Defaults to ‘cuda’.
- Returns
- Anchors with shape
[*feature_size, num_sizes, num_rots, 7].
- Return type
torch.Tensor
- class mmdet3d.core.anchor.AlignedAnchor3DRangeGeneratorPerCls(**kwargs)[source]¶
3D Anchor Generator by range for per class.
This anchor generator generates anchors by the given range for per class. Note that feature maps of different classes may be different.
- Parameters
kwargs (dict) – Arguments are the same as those in
AlignedAnchor3DRangeGenerator
.
- grid_anchors(featmap_sizes, device='cuda')[source]¶
Generate grid anchors in multiple feature levels.
- Parameters
featmap_sizes (list[tuple]) – List of feature map sizes for different classes in a single feature level.
device (str, optional) – Device where the anchors will be put on. Defaults to ‘cuda’.
- Returns
- Anchors in multiple feature levels.
Note that in this anchor generator, we currently only support single feature level. The sizes of each tensor should be [num_sizes/ranges*num_rots*featmap_size, box_code_size].
- Return type
list[list[torch.Tensor]]
- multi_cls_grid_anchors(featmap_sizes, scale, device='cuda')[source]¶
Generate grid anchors of a single level feature map for multi-class with different feature map sizes.
This function is usually called by method
self.grid_anchors
.- Parameters
featmap_sizes (list[tuple]) – List of feature map sizes for different classes in a single feature level.
scale (float) – Scale factor of the anchors in the current level.
device (str, optional) – Device the tensor will be put on. Defaults to ‘cuda’.
- Returns
Anchors in the overall feature map.
- Return type
torch.Tensor
- class mmdet3d.core.anchor.Anchor3DRangeGenerator(ranges, sizes=[[3.9, 1.6, 1.56]], scales=[1], rotations=[0, 1.5707963], custom_values=(), reshape_out=True, size_per_range=True)[source]¶
3D Anchor Generator by range.
This anchor generator generates anchors by the given range in different feature levels. Due the convention in 3D detection, different anchor sizes are related to different ranges for different categories. However we find this setting does not effect the performance much in some datasets, e.g., nuScenes.
- Parameters
ranges (list[list[float]]) – Ranges of different anchors. The ranges are the same across different feature levels. But may vary for different anchor sizes if size_per_range is True.
sizes (list[list[float]], optional) – 3D sizes of anchors. Defaults to [[3.9, 1.6, 1.56]].
scales (list[int], optional) – Scales of anchors in different feature levels. Defaults to [1].
rotations (list[float], optional) – Rotations of anchors in a feature grid. Defaults to [0, 1.5707963].
custom_values (tuple[float], optional) – Customized values of that anchor. For example, in nuScenes the anchors have velocities. Defaults to ().
reshape_out (bool, optional) – Whether to reshape the output into (N x 4). Defaults to True.
size_per_range (bool, optional) – Whether to use separate ranges for different sizes. If size_per_range is True, the ranges should have the same length as the sizes, if not, it will be duplicated. Defaults to True.
- anchors_single_range(feature_size, anchor_range, scale=1, sizes=[[3.9, 1.6, 1.56]], rotations=[0, 1.5707963], device='cuda')[source]¶
Generate anchors in a single range.
- Parameters
feature_size (list[float] | tuple[float]) – Feature map size. It is either a list of a tuple of [D, H, W](in order of z, y, and x).
anchor_range (torch.Tensor | list[float]) – Range of anchors with shape [6]. The order is consistent with that of anchors, i.e., (x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int, optional) – The scale factor of anchors. Defaults to 1.
sizes (list[list] | np.ndarray | torch.Tensor, optional) – Anchor size with shape [N, 3], in order of x, y, z. Defaults to [[3.9, 1.6, 1.56]].
rotations (list[float] | np.ndarray | torch.Tensor, optional) – Rotations of anchors in a single feature grid. Defaults to [0, 1.5707963].
device (str) – Devices that the anchors will be put on. Defaults to ‘cuda’.
- Returns
- Anchors with shape
[*feature_size, num_sizes, num_rots, 7].
- Return type
torch.Tensor
- grid_anchors(featmap_sizes, device='cuda')[source]¶
Generate grid anchors in multiple feature levels.
- Parameters
featmap_sizes (list[tuple]) – List of feature map sizes in multiple feature levels.
device (str, optional) – Device where the anchors will be put on. Defaults to ‘cuda’.
- Returns
- Anchors in multiple feature levels.
The sizes of each tensor should be [N, 4], where N = width * height * num_base_anchors, width and height are the sizes of the corresponding feature level, num_base_anchors is the number of anchors for that level.
- Return type
list[torch.Tensor]
- property num_base_anchors¶
Total number of base anchors in a feature grid.
- Type
list[int]
- property num_levels¶
Number of feature levels that the generator is applied to.
- Type
int
- single_level_grid_anchors(featmap_size, scale, device='cuda')[source]¶
Generate grid anchors of a single level feature map.
This function is usually called by method
self.grid_anchors
.- Parameters
featmap_size (tuple[int]) – Size of the feature map.
scale (float) – Scale factor of the anchors in the current level.
device (str, optional) – Device the tensor will be put on. Defaults to ‘cuda’.
- Returns
Anchors in the overall feature map.
- Return type
torch.Tensor
bbox¶
- class mmdet3d.core.bbox.AssignResult(num_gts, gt_inds, max_overlaps, labels=None)[source]¶
Stores assignments between predicted and truth boxes.
- num_gts¶
the number of truth boxes considered when computing this assignment
- Type
int
- gt_inds¶
for each predicted box indicates the 1-based index of the assigned truth box. 0 means unassigned and -1 means ignore.
- Type
LongTensor
- max_overlaps¶
the iou between the predicted box and its assigned truth box.
- Type
FloatTensor
- labels¶
If specified, for each predicted box indicates the category label of the assigned truth box.
- Type
None | LongTensor
Example
>>> # An assign result between 4 predicted boxes and 9 true boxes >>> # where only two boxes were assigned. >>> num_gts = 9 >>> max_overlaps = torch.LongTensor([0, .5, .9, 0]) >>> gt_inds = torch.LongTensor([-1, 1, 2, 0]) >>> labels = torch.LongTensor([0, 3, 4, 0]) >>> self = AssignResult(num_gts, gt_inds, max_overlaps, labels) >>> print(str(self)) # xdoctest: +IGNORE_WANT <AssignResult(num_gts=9, gt_inds.shape=(4,), max_overlaps.shape=(4,), labels.shape=(4,))> >>> # Force addition of gt labels (when adding gt as proposals) >>> new_labels = torch.LongTensor([3, 4, 5]) >>> self.add_gt_(new_labels) >>> print(str(self)) # xdoctest: +IGNORE_WANT <AssignResult(num_gts=9, gt_inds.shape=(7,), max_overlaps.shape=(7,), labels.shape=(7,))>
- add_gt_(gt_labels)[source]¶
Add ground truth as assigned results.
- Parameters
gt_labels (torch.Tensor) – Labels of gt boxes
- property info¶
a dictionary of info about the object
- Type
dict
- property num_preds¶
the number of predictions in this assignment
- Type
int
- classmethod random(**kwargs)[source]¶
Create random AssignResult for tests or debugging.
- Parameters
num_preds – number of predicted boxes
num_gts – number of true boxes
p_ignore (float) – probability of a predicted box assigned to an ignored truth
p_assigned (float) – probability of a predicted box not being assigned
p_use_label (float | bool) – with labels or not
rng (None | int | numpy.random.RandomState) – seed or state
- Returns
Randomly generated assign results.
- Return type
Example
>>> from mmdet.core.bbox.assigners.assign_result import * # NOQA >>> self = AssignResult.random() >>> print(self.info)
- class mmdet3d.core.bbox.AxisAlignedBboxOverlaps3D[source]¶
Axis-aligned 3D Overlaps (IoU) Calculator.
- class mmdet3d.core.bbox.BaseAssigner[source]¶
Base assigner that assigns boxes to ground truth boxes.
- class mmdet3d.core.bbox.BaseInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]¶
Base class for 3D Boxes.
Note
The box is bottom centered, i.e. the relative position of origin in the box is (0.5, 0.5, 0).
- Parameters
tensor (torch.Tensor | np.ndarray | list) – a N x box_dim matrix.
box_dim (int) – Number of the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw). Defaults to 7.
with_yaw (bool) – Whether the box is with yaw rotation. If False, the value of yaw will be set to 0 as minmax boxes. Defaults to True.
origin (tuple[float], optional) – Relative position of the box origin. Defaults to (0.5, 0.5, 0). This will guide the box be converted to (0.5, 0.5, 0) mode.
- tensor¶
Float matrix of N x box_dim.
- Type
torch.Tensor
- box_dim¶
Integer indicating the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw, …).
- Type
int
- with_yaw¶
If True, the value of yaw will be set to 0 as minmax boxes.
- Type
bool
- property bev¶
2D BEV box of each box with rotation in XYWHR format, in shape (N, 5).
- Type
torch.Tensor
- property bottom_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- property bottom_height¶
torch.Tensor: A vector with bottom’s height of each box in shape (N, ).
- classmethod cat(boxes_list)[source]¶
Concatenate a list of Boxes into a single Boxes.
- Parameters
boxes_list (list[
BaseInstance3DBoxes
]) – List of boxes.- Returns
The concatenated Boxes.
- Return type
- property center¶
Calculate the center of all the boxes.
Note
In MMDetection3D’s convention, the bottom center is usually taken as the default center.
The relative position of the centers in different kinds of boxes are different, e.g., the relative center of a boxes is (0.5, 1.0, 0.5) in camera and (0.5, 0.5, 0) in lidar. It is recommended to use
bottom_center
orgravity_center
for clearer usage.- Returns
A tensor with center of each box in shape (N, 3).
- Return type
torch.Tensor
- abstract convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
- The converted box of the same type
in the dst mode.
- Return type
- property corners¶
torch.Tensor: a tensor with 8 corners of each box in shape (N, 8, 3).
- property device¶
The device of the boxes are on.
- Type
str
- property dims¶
Size dimensions of each box in shape (N, 3).
- Type
torch.Tensor
- abstract flip(bev_direction='horizontal')[source]¶
Flip the boxes in BEV along given BEV direction.
- Parameters
bev_direction (str, optional) – Direction by which to flip. Can be chosen from ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.
- property gravity_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- property height¶
A vector with height of each box in shape (N, ).
- Type
torch.Tensor
- classmethod height_overlaps(boxes1, boxes2, mode='iou')[source]¶
Calculate height overlaps of two boxes.
Note
This function calculates the height overlaps between boxes1 and boxes2, boxes1 and boxes2 should be in the same type.
- Parameters
boxes1 (
BaseInstance3DBoxes
) – Boxes 1 contain N boxes.boxes2 (
BaseInstance3DBoxes
) – Boxes 2 contain M boxes.mode (str, optional) – Mode of IoU calculation. Defaults to ‘iou’.
- Returns
Calculated iou of boxes.
- Return type
torch.Tensor
- in_range_3d(box_range)[source]¶
Check whether the boxes are in the given range.
- Parameters
box_range (list | torch.Tensor) – The range of box (x_min, y_min, z_min, x_max, y_max, z_max)
Note
In the original implementation of SECOND, checking whether a box in the range checks whether the points are in a convex polygon, we try to reduce the burden for simpler cases.
- Returns
- A binary vector indicating whether each box is
inside the reference range.
- Return type
torch.Tensor
- in_range_bev(box_range)[source]¶
Check whether the boxes are in the given range.
- Parameters
box_range (list | torch.Tensor) – the range of box (x_min, y_min, x_max, y_max)
Note
The original implementation of SECOND checks whether boxes in a range by checking whether the points are in a convex polygon, we reduce the burden for simpler cases.
- Returns
Whether each box is inside the reference range.
- Return type
torch.Tensor
- limit_yaw(offset=0.5, period=3.141592653589793)[source]¶
Limit the yaw to a given period and offset.
- Parameters
offset (float, optional) – The offset of the yaw. Defaults to 0.5.
period (float, optional) – The expected period. Defaults to np.pi.
- property nearest_bev¶
A tensor of 2D BEV box of each box without rotation.
- Type
torch.Tensor
- new_box(data)[source]¶
Create a new box object with data.
- The new box and its tensor has the similar properties
as self and self.tensor, respectively.
- Parameters
data (torch.Tensor | numpy.array | list) – Data to be copied.
- Returns
- A new bbox object with
data
, the object’s other properties are similar to
self
.
- A new bbox object with
- Return type
- nonempty(threshold=0.0)[source]¶
Find boxes that are non-empty.
A box is considered empty, if either of its side is no larger than threshold.
- Parameters
threshold (float, optional) – The threshold of minimal sizes. Defaults to 0.0.
- Returns
- A binary vector which represents whether each
box is empty (False) or non-empty (True).
- Return type
torch.Tensor
- classmethod overlaps(boxes1, boxes2, mode='iou')[source]¶
Calculate 3D overlaps of two boxes.
Note
This function calculates the overlaps between
boxes1
andboxes2
,boxes1
andboxes2
should be in the same type.- Parameters
boxes1 (
BaseInstance3DBoxes
) – Boxes 1 contain N boxes.boxes2 (
BaseInstance3DBoxes
) – Boxes 2 contain M boxes.mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.
- Returns
Calculated 3D overlaps of the boxes.
- Return type
torch.Tensor
- points_in_boxes_all(points, boxes_override=None)[source]¶
Find all boxes in which each point is.
- Parameters
points (torch.Tensor) – Points in shape (1, M, 3) or (M, 3), 3 dimensions are (x, y, z) in LiDAR or depth coordinate.
boxes_override (torch.Tensor, optional) – Boxes to override self.tensor. Defaults to None.
- Returns
- A tensor indicating whether a point is in a box,
in shape (M, T). T is the number of boxes. Denote this tensor as A, if the m^th point is in the t^th box, then A[m, t] == 1, elsewise A[m, t] == 0.
- Return type
torch.Tensor
- points_in_boxes_part(points, boxes_override=None)[source]¶
Find the box in which each point is.
- Parameters
points (torch.Tensor) – Points in shape (1, M, 3) or (M, 3), 3 dimensions are (x, y, z) in LiDAR or depth coordinate.
boxes_override (torch.Tensor, optional) – Boxes to override self.tensor. Defaults to None.
- Returns
- The index of the first box that each point
is in, in shape (M, ). Default value is -1 (if the point is not enclosed by any box).
- Return type
torch.Tensor
Note
If a point is enclosed by multiple boxes, the index of the first box will be returned.
- abstract rotate(angle, points=None)[source]¶
Rotate boxes with points (optional) with the given angle or rotation matrix.
- Parameters
angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.
(torch.Tensor | numpy.ndarray | (points) –
BasePoints
, optional): Points to rotate. Defaults to None.
- scale(scale_factor)[source]¶
Scale the box with horizontal and vertical scaling factors.
- Parameters
scale_factors (float) – Scale factors to scale the boxes.
- to(device)[source]¶
Convert current boxes to a specific device.
- Parameters
device (str |
torch.device
) – The name of the device.- Returns
- A new boxes object on the
specific device.
- Return type
- property top_height¶
torch.Tensor: A vector with the top height of each box in shape (N, ).
- translate(trans_vector)[source]¶
Translate boxes with the given translation vector.
- Parameters
trans_vector (torch.Tensor) – Translation vector of size (1, 3).
- property volume¶
A vector with volume of each box.
- Type
torch.Tensor
- property yaw¶
A vector with yaw of each box in shape (N, ).
- Type
torch.Tensor
- class mmdet3d.core.bbox.BaseSampler(num, pos_fraction, neg_pos_ub=- 1, add_gt_as_proposals=True, **kwargs)[source]¶
Base class of samplers.
- sample(assign_result, bboxes, gt_bboxes, gt_labels=None, **kwargs)[source]¶
Sample positive and negative bboxes.
This is a simple implementation of bbox sampling given candidates, assigning results and ground truth bboxes.
- Parameters
assign_result (
AssignResult
) – Bbox assigning results.bboxes (Tensor) – Boxes to be sampled from.
gt_bboxes (Tensor) – Ground truth bboxes.
gt_labels (Tensor, optional) – Class labels of ground truth bboxes.
- Returns
Sampling result.
- Return type
Example
>>> from mmdet.core.bbox import RandomSampler >>> from mmdet.core.bbox import AssignResult >>> from mmdet.core.bbox.demodata import ensure_rng, random_boxes >>> rng = ensure_rng(None) >>> assign_result = AssignResult.random(rng=rng) >>> bboxes = random_boxes(assign_result.num_preds, rng=rng) >>> gt_bboxes = random_boxes(assign_result.num_gts, rng=rng) >>> gt_labels = None >>> self = RandomSampler(num=32, pos_fraction=0.5, neg_pos_ub=-1, >>> add_gt_as_proposals=False) >>> self = self.sample(assign_result, bboxes, gt_bboxes, gt_labels)
- class mmdet3d.core.bbox.BboxOverlaps3D(coordinate)[source]¶
3D IoU Calculator.
- Parameters
coordinate (str) – The coordinate system, valid options are ‘camera’, ‘lidar’, and ‘depth’.
- class mmdet3d.core.bbox.BboxOverlapsNearest3D(coordinate='lidar')[source]¶
Nearest 3D IoU Calculator.
Note
This IoU calculator first finds the nearest 2D boxes in bird eye view (BEV), and then calculates the 2D IoU using
bbox_overlaps()
.- Parameters
coordinate (str) – ‘camera’, ‘lidar’, or ‘depth’ coordinate system.
- class mmdet3d.core.bbox.Box3DMode(value)[source]¶
Enum of different ways to represent a box.
Coordinates in LiDAR:
up z ^ x front | / | / left y <------ 0
The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.
Coordinates in camera:
z front / / 0 ------> x right | | v down y
The relative coordinate of bottom center in a CAM box is [0.5, 1.0, 0.5], and the yaw is around the y axis, thus the rotation axis=1.
Coordinates in Depth mode:
up z ^ y front | / | / 0 ------> x right
The relative coordinate of bottom center in a DEPTH box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.
- static convert(box, src, dst, rt_mat=None, with_yaw=True)[source]¶
Convert boxes from src mode to dst mode.
- Parameters
(tuple | list | np.ndarray | (box) – torch.Tensor |
BaseInstance3DBoxes
): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.src (
Box3DMode
) – The src Box mode.dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
with_yaw (bool, optional) – If box is an instance of
BaseInstance3DBoxes
, whether or not it has a yaw angle. Defaults to True.
- Returns
- (tuple | list | np.ndarray | torch.Tensor |
BaseInstance3DBoxes
): The converted box of the same type.
- class mmdet3d.core.bbox.CameraInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 1.0, 0.5))[source]¶
3D boxes of instances in CAM coordinates.
Coordinates in camera:
z front (yaw=-0.5*pi) / / 0 ------> x right (yaw=0) | | v down y
The relative coordinate of bottom center in a CAM box is (0.5, 1.0, 0.5), and the yaw is around the y axis, thus the rotation axis=1. The yaw is 0 at the positive direction of x axis, and decreases from the positive direction of x to the positive direction of z.
- tensor¶
Float matrix in shape (N, box_dim).
- Type
torch.Tensor
- box_dim¶
Integer indicating the dimension of a box Each row is (x, y, z, x_size, y_size, z_size, yaw, …).
- Type
int
- with_yaw¶
If True, the value of yaw will be set to 0 as axis-aligned boxes tightly enclosing the original boxes.
- Type
bool
- property bev¶
2D BEV box of each box with rotation in XYWHR format, in shape (N, 5).
- Type
torch.Tensor
- property bottom_height¶
torch.Tensor: A vector with bottom’s height of each box in shape (N, ).
- convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from
src
coordinates todst
coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
The converted box of the same type in the
dst
mode.- Return type
- property corners¶
- Coordinates of corners of all the boxes in
shape (N, 8, 3).
Convert the boxes to in clockwise order, in the form of (x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)
front z / / (x0, y0, z1) + ----------- + (x1, y0, z1) /| / | / | / | (x0, y0, z0) + ----------- + + (x1, y1, z1) | / . | / | / origin | / (x0, y1, z0) + ----------- + -------> x right | (x1, y1, z0) | v down y
- Type
torch.Tensor
- flip(bev_direction='horizontal', points=None)[source]¶
Flip the boxes in BEV along given BEV direction.
In CAM coordinates, it flips the x (horizontal) or z (vertical) axis.
- Parameters
bev_direction (str) – Flip direction (horizontal or vertical).
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to flip. Defaults to None.
- Returns
Flipped points.
- Return type
torch.Tensor, numpy.ndarray or None
- property gravity_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- property height¶
A vector with height of each box in shape (N, ).
- Type
torch.Tensor
- classmethod height_overlaps(boxes1, boxes2, mode='iou')[source]¶
Calculate height overlaps of two boxes.
This function calculates the height overlaps between
boxes1
andboxes2
, whereboxes1
andboxes2
should be in the same type.- Parameters
boxes1 (
CameraInstance3DBoxes
) – Boxes 1 contain N boxes.boxes2 (
CameraInstance3DBoxes
) – Boxes 2 contain M boxes.mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.
- Returns
Calculated iou of boxes’ heights.
- Return type
torch.Tensor
- property local_yaw¶
torch.Tensor: A vector with local yaw of each box in shape (N, ). local_yaw equals to alpha in kitti, which is commonly used in monocular 3D object detection task, so only
CameraInstance3DBoxes
has the property.
- points_in_boxes_all(points, boxes_override=None)[source]¶
Find all boxes in which each point is.
- Parameters
- Returns
- The index of all boxes in which each point is,
in shape (B, M, T).
- Return type
torch.Tensor
- points_in_boxes_part(points, boxes_override=None)[source]¶
Find the box in which each point is.
- Parameters
- Returns
- The index of the box in which
each point is, in shape (M, ). Default value is -1 (if the point is not enclosed by any box).
- Return type
torch.Tensor
- rotate(angle, points=None)[source]¶
Rotate boxes with points (optional) with the given angle or rotation matrix.
- Parameters
angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to rotate. Defaults to None.
- Returns
- When
points
is None, the function returns None, otherwise it returns the rotated points and the rotation matrix
rot_mat_T
.
- When
- Return type
tuple or None
- property top_height¶
torch.Tensor: A vector with the top height of each box in shape (N, ).
- class mmdet3d.core.bbox.CombinedSampler(pos_sampler, neg_sampler, **kwargs)[source]¶
A sampler that combines positive sampler and negative sampler.
- class mmdet3d.core.bbox.Coord3DMode(value)[source]¶
- Enum of different ways to represent a box
and point cloud.
Coordinates in LiDAR:
up z ^ x front | / | / left y <------ 0
The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.
Coordinates in camera:
z front / / 0 ------> x right | | v down y
The relative coordinate of bottom center in a CAM box is [0.5, 1.0, 0.5], and the yaw is around the y axis, thus the rotation axis=1.
Coordinates in Depth mode:
up z ^ y front | / | / 0 ------> x right
The relative coordinate of bottom center in a DEPTH box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.
- static convert(input, src, dst, rt_mat=None, with_yaw=True, is_point=True)[source]¶
Convert boxes or points from src mode to dst mode.
- Parameters
(tuple | list | np.ndarray | torch.Tensor | (input) –
BaseInstance3DBoxes
|BasePoints
): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.src (
Box3DMode
|Coord3DMode
) – The source mode.dst (
Box3DMode
|Coord3DMode
) – The target mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
with_yaw (bool) – If box is an instance of
BaseInstance3DBoxes
, whether or not it has a yaw angle. Defaults to True.is_point (bool) – If input is neither an instance of
BaseInstance3DBoxes
nor an instance ofBasePoints
, whether or not it is point data. Defaults to True.
- Returns
- (tuple | list | np.ndarray | torch.Tensor |
BaseInstance3DBoxes
|BasePoints
): The converted box of the same type.
- static convert_box(box, src, dst, rt_mat=None, with_yaw=True)[source]¶
Convert boxes from src mode to dst mode.
- Parameters
(tuple | list | np.ndarray | (box) – torch.Tensor |
BaseInstance3DBoxes
): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.src (
Box3DMode
) – The src Box mode.dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
with_yaw (bool) – If box is an instance of
BaseInstance3DBoxes
, whether or not it has a yaw angle. Defaults to True.
- Returns
- (tuple | list | np.ndarray | torch.Tensor |
BaseInstance3DBoxes
): The converted box of the same type.
- static convert_point(point, src, dst, rt_mat=None)[source]¶
Convert points from src mode to dst mode.
- Parameters
(tuple | list | np.ndarray | (point) – torch.Tensor |
BasePoints
): Can be a k-tuple, k-list or an Nxk array/tensor.src (
CoordMode
) – The src Point mode.dst (
CoordMode
) – The target Point mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
The converted point of the same type.
- Return type
(tuple | list | np.ndarray | torch.Tensor |
BasePoints
)
- class mmdet3d.core.bbox.DeltaXYZWLHRBBoxCoder(code_size=7)[source]¶
Bbox Coder for 3D boxes.
- Parameters
code_size (int) – The dimension of boxes to be encoded.
- static decode(anchors, deltas)[source]¶
Apply transformation deltas (dx, dy, dz, dx_size, dy_size, dz_size, dr, dv*) to boxes.
- Parameters
anchors (torch.Tensor) – Parameters of anchors with shape (N, 7).
deltas (torch.Tensor) – Encoded boxes with shape (N, 7+n) [x, y, z, x_size, y_size, z_size, r, velo*].
- Returns
Decoded boxes.
- Return type
torch.Tensor
- static encode(src_boxes, dst_boxes)[source]¶
Get box regression transformation deltas (dx, dy, dz, dx_size, dy_size, dz_size, dr, dv*) that can be used to transform the src_boxes into the target_boxes.
- Parameters
src_boxes (torch.Tensor) – source boxes, e.g., object proposals.
dst_boxes (torch.Tensor) – target of the transformation, e.g., ground-truth boxes.
- Returns
Box transformation deltas.
- Return type
torch.Tensor
- class mmdet3d.core.bbox.DepthInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]¶
3D boxes of instances in Depth coordinates.
Coordinates in Depth:
up z y front (yaw=-0.5*pi) ^ ^ | / | / 0 ------> x right (yaw=0)
The relative coordinate of bottom center in a Depth box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2. The yaw is 0 at the positive direction of x axis, and decreases from the positive direction of x to the positive direction of y. Also note that rotation of DepthInstance3DBoxes is counterclockwise, which is reverse to the definition of the yaw angle (clockwise).
A refactor is ongoing to make the three coordinate systems easier to understand and convert between each other.
- tensor¶
Float matrix of N x box_dim.
- Type
torch.Tensor
- box_dim¶
Integer indicates the dimension of a box Each row is (x, y, z, x_size, y_size, z_size, yaw, …).
- Type
int
- with_yaw¶
If True, the value of yaw will be set to 0 as minmax boxes.
- Type
bool
- convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
Box3DMode
) – The target Box mode.rt_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from
src
coordinates todst
coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
The converted box of the same type in the
dst
mode.- Return type
- property corners¶
Coordinates of corners of all the boxes in shape (N, 8, 3).
Convert the boxes to corners in clockwise order, in form of
(x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)
up z front y ^ / | / | (x0, y1, z1) + ----------- + (x1, y1, z1) /| / | / | / | (x0, y0, z1) + ----------- + + (x1, y1, z0) | / . | / | / origin | / (x0, y0, z0) + ----------- + --------> right x (x1, y0, z0)
- Type
torch.Tensor
- enlarged_box(extra_width)[source]¶
Enlarge the length, width and height boxes.
- Parameters
extra_width (float | torch.Tensor) – Extra width to enlarge the box.
- Returns
Enlarged boxes.
- Return type
- flip(bev_direction='horizontal', points=None)[source]¶
Flip the boxes in BEV along given BEV direction.
In Depth coordinates, it flips x (horizontal) or y (vertical) axis.
- Parameters
bev_direction (str, optional) – Flip direction (horizontal or vertical). Defaults to ‘horizontal’.
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to flip. Defaults to None.
- Returns
Flipped points.
- Return type
torch.Tensor, numpy.ndarray or None
- get_surface_line_center()[source]¶
Compute surface and line center of bounding boxes.
- Returns
Surface and line center of bounding boxes.
- Return type
torch.Tensor
- property gravity_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- rotate(angle, points=None)[source]¶
Rotate boxes with points (optional) with the given angle or rotation matrix.
- Parameters
angle (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to rotate. Defaults to None.
- Returns
- When
points
is None, the function returns None, otherwise it returns the rotated points and the rotation matrix
rot_mat_T
.
- When
- Return type
tuple or None
- class mmdet3d.core.bbox.InstanceBalancedPosSampler(num, pos_fraction, neg_pos_ub=- 1, add_gt_as_proposals=True, **kwargs)[source]¶
Instance balanced sampler that samples equal number of positive samples for each instance.
- class mmdet3d.core.bbox.IoUBalancedNegSampler(num, pos_fraction, floor_thr=- 1, floor_fraction=0, num_bins=3, **kwargs)[source]¶
IoU Balanced Sampling.
arXiv: https://arxiv.org/pdf/1904.02701.pdf (CVPR 2019)
Sampling proposals according to their IoU. floor_fraction of needed RoIs are sampled from proposals whose IoU are lower than floor_thr randomly. The others are sampled from proposals whose IoU are higher than floor_thr. These proposals are sampled from some bins evenly, which are split by num_bins via IoU evenly.
- Parameters
num (int) – number of proposals.
pos_fraction (float) – fraction of positive proposals.
floor_thr (float) – threshold (minimum) IoU for IoU balanced sampling, set to -1 if all using IoU balanced sampling.
floor_fraction (float) – sampling fraction of proposals under floor_thr.
num_bins (int) – number of bins in IoU balanced sampling.
- sample_via_interval(max_overlaps, full_set, num_expected)[source]¶
Sample according to the iou interval.
- Parameters
max_overlaps (torch.Tensor) – IoU between bounding boxes and ground truth boxes.
full_set (set(int)) – A full set of indices of boxes。
num_expected (int) – Number of expected samples。
- Returns
Indices of samples
- Return type
np.ndarray
- class mmdet3d.core.bbox.LiDARInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=(0.5, 0.5, 0))[source]¶
3D boxes of instances in LIDAR coordinates.
Coordinates in LiDAR:
up z x front (yaw=0) ^ ^ | / | / (yaw=0.5*pi) left y <------ 0
The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2. The yaw is 0 at the positive direction of x axis, and increases from the positive direction of x to the positive direction of y.
A refactor is ongoing to make the three coordinate systems easier to understand and convert between each other.
- tensor¶
Float matrix of N x box_dim.
- Type
torch.Tensor
- box_dim¶
Integer indicating the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw, …).
- Type
int
- with_yaw¶
If True, the value of yaw will be set to 0 as minmax boxes.
- Type
bool
- convert_to(dst, rt_mat=None)[source]¶
Convert self to
dst
mode.- Parameters
dst (
Box3DMode
) – the target Box modert_mat (np.ndarray | torch.Tensor, optional) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from
src
coordinates todst
coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.
- Returns
The converted box of the same type in the
dst
mode.- Return type
- property corners¶
Coordinates of corners of all the boxes in shape (N, 8, 3).
Convert the boxes to corners in clockwise order, in form of
(x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)
up z front x ^ / | / | (x1, y0, z1) + ----------- + (x1, y1, z1) /| / | / | / | (x0, y0, z1) + ----------- + + (x1, y1, z0) | / . | / | / origin | / left y<-------- + ----------- + (x0, y1, z0) (x0, y0, z0)
- Type
torch.Tensor
- enlarged_box(extra_width)[source]¶
Enlarge the length, width and height boxes.
- Parameters
extra_width (float | torch.Tensor) – Extra width to enlarge the box.
- Returns
Enlarged boxes.
- Return type
- flip(bev_direction='horizontal', points=None)[source]¶
Flip the boxes in BEV along given BEV direction.
In LIDAR coordinates, it flips the y (horizontal) or x (vertical) axis.
- Parameters
bev_direction (str) – Flip direction (horizontal or vertical).
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to flip. Defaults to None.
- Returns
Flipped points.
- Return type
torch.Tensor, numpy.ndarray or None
- property gravity_center¶
A tensor with center of each box in shape (N, 3).
- Type
torch.Tensor
- rotate(angle, points=None)[source]¶
Rotate boxes with points (optional) with the given angle or rotation matrix.
- Parameters
angles (float | torch.Tensor | np.ndarray) – Rotation angle or rotation matrix.
points (torch.Tensor | np.ndarray |
BasePoints
, optional) – Points to rotate. Defaults to None.
- Returns
- When
points
is None, the function returns None, otherwise it returns the rotated points and the rotation matrix
rot_mat_T
.
- When
- Return type
tuple or None
- class mmdet3d.core.bbox.MaxIoUAssigner(pos_iou_thr, neg_iou_thr, min_pos_iou=0.0, gt_max_assign_all=True, ignore_iof_thr=- 1, ignore_wrt_candidates=True, match_low_quality=True, gpu_assign_thr=- 1, iou_calculator={'type': 'BboxOverlaps2D'})[source]¶
Assign a corresponding gt bbox or background to each bbox.
Each proposals will be assigned with -1, or a semi-positive integer indicating the ground truth index.
-1: negative sample, no assigned gt
semi-positive integer: positive sample, index (0-based) of assigned gt
- Parameters
pos_iou_thr (float) – IoU threshold for positive bboxes.
neg_iou_thr (float or tuple) – IoU threshold for negative bboxes.
min_pos_iou (float) – Minimum iou for a bbox to be considered as a positive bbox. Positive samples can have smaller IoU than pos_iou_thr due to the 4th step (assign max IoU sample to each gt). min_pos_iou is set to avoid assigning bboxes that have extremely small iou with GT as positive samples. It brings about 0.3 mAP improvements in 1x schedule but does not affect the performance of 3x schedule. More comparisons can be found in PR #7464.
gt_max_assign_all (bool) – Whether to assign all bboxes with the same highest overlap with some gt to that gt.
ignore_iof_thr (float) – IoF threshold for ignoring bboxes (if gt_bboxes_ignore is specified). Negative values mean not ignoring any bboxes.
ignore_wrt_candidates (bool) – Whether to compute the iof between bboxes and gt_bboxes_ignore, or the contrary.
match_low_quality (bool) – Whether to allow low quality matches. This is usually allowed for RPN and single stage detectors, but not allowed in the second stage. Details are demonstrated in Step 4.
gpu_assign_thr (int) – The upper bound of the number of GT for GPU assign. When the number of gt is above this threshold, will assign on CPU device. Negative values mean not assign on CPU.
- assign(bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None)[source]¶
Assign gt to bboxes.
This method assign a gt bbox to every bbox (proposal/anchor), each bbox will be assigned with -1, or a semi-positive number. -1 means negative sample, semi-positive number is the index (0-based) of assigned gt. The assignment is done in following steps, the order matters.
assign every bbox to the background
assign proposals whose iou with all gts < neg_iou_thr to 0
for each bbox, if the iou with its nearest gt >= pos_iou_thr, assign it to that bbox
for each gt bbox, assign its nearest proposals (may be more than one) to itself
- Parameters
bboxes (Tensor) – Bounding boxes to be assigned, shape(n, 4).
gt_bboxes (Tensor) – Groundtruth boxes, shape (k, 4).
gt_bboxes_ignore (Tensor, optional) – Ground truth bboxes that are labelled as ignored, e.g., crowd boxes in COCO.
gt_labels (Tensor, optional) – Label of gt_bboxes, shape (k, ).
- Returns
The assign result.
- Return type
Example
>>> self = MaxIoUAssigner(0.5, 0.5) >>> bboxes = torch.Tensor([[0, 0, 10, 10], [10, 10, 20, 20]]) >>> gt_bboxes = torch.Tensor([[0, 0, 10, 9]]) >>> assign_result = self.assign(bboxes, gt_bboxes) >>> expected_gt_inds = torch.LongTensor([1, 0]) >>> assert torch.all(assign_result.gt_inds == expected_gt_inds)
- class mmdet3d.core.bbox.PseudoSampler(**kwargs)[source]¶
A pseudo sampler that does not do sampling actually.
- sample(assign_result, bboxes, gt_bboxes, *args, **kwargs)[source]¶
Directly returns the positive and negative indices of samples.
- Parameters
assign_result (
AssignResult
) – Assigned resultsbboxes (torch.Tensor) – Bounding boxes
gt_bboxes (torch.Tensor) – Ground truth boxes
- Returns
sampler results
- Return type
- class mmdet3d.core.bbox.RandomSampler(num, pos_fraction, neg_pos_ub=- 1, add_gt_as_proposals=True, **kwargs)[source]¶
Random sampler.
- Parameters
num (int) – Number of samples
pos_fraction (float) – Fraction of positive samples
neg_pos_up (int, optional) – Upper bound number of negative and positive samples. Defaults to -1.
add_gt_as_proposals (bool, optional) – Whether to add ground truth boxes as proposals. Defaults to True.
- random_choice(gallery, num)[source]¶
Random select some elements from the gallery.
If gallery is a Tensor, the returned indices will be a Tensor; If gallery is a ndarray or list, the returned indices will be a ndarray.
- Parameters
gallery (Tensor | ndarray | list) – indices pool.
num (int) – expected sample num.
- Returns
sampled indices.
- Return type
Tensor or ndarray
- class mmdet3d.core.bbox.SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes, assign_result, gt_flags)[source]¶
Bbox sampling result.
Example
>>> # xdoctest: +IGNORE_WANT >>> from mmdet.core.bbox.samplers.sampling_result import * # NOQA >>> self = SamplingResult.random(rng=10) >>> print(f'self = {self}') self = <SamplingResult({ 'neg_bboxes': torch.Size([12, 4]), 'neg_inds': tensor([ 0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12]), 'num_gts': 4, 'pos_assigned_gt_inds': tensor([], dtype=torch.int64), 'pos_bboxes': torch.Size([0, 4]), 'pos_inds': tensor([], dtype=torch.int64), 'pos_is_gt': tensor([], dtype=torch.uint8) })>
- property bboxes¶
concatenated positive and negative boxes
- Type
torch.Tensor
- property info¶
Returns a dictionary of info about the object.
- classmethod random(rng=None, **kwargs)[source]¶
- Parameters
rng (None | int | numpy.random.RandomState) – seed or state.
kwargs (keyword arguments) –
num_preds: number of predicted boxes
num_gts: number of true boxes
p_ignore (float): probability of a predicted box assigned to an ignored truth.
p_assigned (float): probability of a predicted box not being assigned.
p_use_label (float | bool): with labels or not.
- Returns
Randomly generated sampling result.
- Return type
Example
>>> from mmdet.core.bbox.samplers.sampling_result import * # NOQA >>> self = SamplingResult.random() >>> print(self.__dict__)
- mmdet3d.core.bbox.axis_aligned_bbox_overlaps_3d(bboxes1, bboxes2, mode='iou', is_aligned=False, eps=1e-06)[source]¶
Calculate overlap between two set of axis aligned 3D bboxes. If
is_aligned
isFalse
, then calculate the overlaps between each bbox of bboxes1 and bboxes2, otherwise the overlaps between each aligned pair of bboxes1 and bboxes2.- Parameters
bboxes1 (Tensor) – shape (B, m, 6) in <x1, y1, z1, x2, y2, z2> format or empty.
bboxes2 (Tensor) – shape (B, n, 6) in <x1, y1, z1, x2, y2, z2> format or empty. B indicates the batch dim, in shape (B1, B2, …, Bn). If
is_aligned
isTrue
, then m and n must be equal.mode (str) – “iou” (intersection over union) or “giou” (generalized intersection over union).
is_aligned (bool, optional) – If True, then m and n must be equal. Defaults to False.
eps (float, optional) – A value added to the denominator for numerical stability. Defaults to 1e-6.
- Returns
shape (m, n) if
is_aligned
is False else shape (m,)- Return type
Tensor
Example
>>> bboxes1 = torch.FloatTensor([ >>> [0, 0, 0, 10, 10, 10], >>> [10, 10, 10, 20, 20, 20], >>> [32, 32, 32, 38, 40, 42], >>> ]) >>> bboxes2 = torch.FloatTensor([ >>> [0, 0, 0, 10, 20, 20], >>> [0, 10, 10, 10, 19, 20], >>> [10, 10, 10, 20, 20, 20], >>> ]) >>> overlaps = axis_aligned_bbox_overlaps_3d(bboxes1, bboxes2) >>> assert overlaps.shape == (3, 3) >>> overlaps = bbox_overlaps(bboxes1, bboxes2, is_aligned=True) >>> assert overlaps.shape == (3, )
Example
>>> empty = torch.empty(0, 6) >>> nonempty = torch.FloatTensor([[0, 0, 0, 10, 9, 10]]) >>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1) >>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0) >>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)
- mmdet3d.core.bbox.bbox3d2result(bboxes, scores, labels, attrs=None)[source]¶
Convert detection results to a list of numpy arrays.
- Parameters
bboxes (torch.Tensor) – Bounding boxes with shape (N, 5).
labels (torch.Tensor) – Labels with shape (N, ).
scores (torch.Tensor) – Scores with shape (N, ).
attrs (torch.Tensor, optional) – Attributes with shape (N, ). Defaults to None.
- Returns
Bounding box results in cpu mode.
boxes_3d (torch.Tensor): 3D boxes.
scores (torch.Tensor): Prediction scores.
labels_3d (torch.Tensor): Box labels.
attrs_3d (torch.Tensor, optional): Box attributes.
- Return type
dict[str, torch.Tensor]
- mmdet3d.core.bbox.bbox3d2roi(bbox_list)[source]¶
Convert a list of bounding boxes to roi format.
- Parameters
bbox_list (list[torch.Tensor]) – A list of bounding boxes corresponding to a batch of images.
- Returns
- Region of interests in shape (n, c), where
the channels are in order of [batch_ind, x, y …].
- Return type
torch.Tensor
- mmdet3d.core.bbox.bbox3d_mapping_back(bboxes, scale_factor, flip_horizontal, flip_vertical)[source]¶
Map bboxes from testing scale to original image scale.
- Parameters
bboxes (
BaseInstance3DBoxes
) – Boxes to be mapped back.scale_factor (float) – Scale factor.
flip_horizontal (bool) – Whether to flip horizontally.
flip_vertical (bool) – Whether to flip vertically.
- Returns
Boxes mapped back.
- Return type
- mmdet3d.core.bbox.bbox_overlaps_3d(bboxes1, bboxes2, mode='iou', coordinate='camera')[source]¶
Calculate 3D IoU using cuda implementation.
Note
This function calculates the IoU of 3D boxes based on their volumes. IoU calculator
BboxOverlaps3D
uses this function to calculate the actual IoUs of boxes.- Parameters
bboxes1 (torch.Tensor) – with shape (N, 7+C), (x, y, z, x_size, y_size, z_size, ry, v*).
bboxes2 (torch.Tensor) – with shape (M, 7+C), (x, y, z, x_size, y_size, z_size, ry, v*).
mode (str) – “iou” (intersection over union) or iof (intersection over foreground).
coordinate (str) – ‘camera’ or ‘lidar’ coordinate system.
- Returns
- Bbox overlaps results of bboxes1 and bboxes2
with shape (M, N) (aligned mode is not supported currently).
- Return type
torch.Tensor
- mmdet3d.core.bbox.bbox_overlaps_nearest_3d(bboxes1, bboxes2, mode='iou', is_aligned=False, coordinate='lidar')[source]¶
Calculate nearest 3D IoU.
Note
This function first finds the nearest 2D boxes in bird eye view (BEV), and then calculates the 2D IoU using
bbox_overlaps()
. This IoU calculatorBboxOverlapsNearest3D
uses this function to calculate IoUs of boxes.If
is_aligned
isFalse
, then it calculates the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.- Parameters
bboxes1 (torch.Tensor) – with shape (N, 7+C), (x, y, z, x_size, y_size, z_size, ry, v*).
bboxes2 (torch.Tensor) – with shape (M, 7+C), (x, y, z, x_size, y_size, z_size, ry, v*).
mode (str) – “iou” (intersection over union) or iof (intersection over foreground).
is_aligned (bool) – Whether the calculation is aligned
- Returns
- If
is_aligned
isTrue
, return ious between bboxes1 and bboxes2 with shape (M, N). If
is_aligned
isFalse
, return shape is M.
- If
- Return type
torch.Tensor
- mmdet3d.core.bbox.get_box_type(box_type)[source]¶
Get the type and mode of box structure.
- Parameters
box_type (str) – The type of box structure. The valid value are “LiDAR”, “Camera”, or “Depth”.
- Raises
ValueError – A ValueError is raised when box_type does not belong to the three valid types.
- Returns
Box type and box mode.
- Return type
tuple
- mmdet3d.core.bbox.limit_period(val, offset=0.5, period=3.141592653589793)[source]¶
Limit the value into a period for periodic function.
- Parameters
val (torch.Tensor | np.ndarray) – The value to be converted.
offset (float, optional) – Offset to set the value range. Defaults to 0.5.
period ([type], optional) – Period of the value. Defaults to np.pi.
- Returns
- Value in the range of
[-offset * period, (1-offset) * period]
- Return type
(torch.Tensor | np.ndarray)
- mmdet3d.core.bbox.mono_cam_box2vis(cam_box)[source]¶
This is a post-processing function on the bboxes from Mono-3D task. If we want to perform projection visualization, we need to:
rotate the box along x-axis for np.pi / 2 (roll)
change orientation from local yaw to global yaw
convert yaw by (np.pi / 2 - yaw)
After applying this function, we can project and draw it on 2D images.
- Parameters
cam_box (
CameraInstance3DBoxes
) – 3D bbox in camera coordinate system before conversion. Could be gt bbox loaded from dataset or network prediction output.- Returns
Box after conversion.
- Return type
- mmdet3d.core.bbox.points_cam2img(points_3d, proj_mat, with_depth=False)[source]¶
Project points in camera coordinates to image coordinates.
- Parameters
points_3d (torch.Tensor | np.ndarray) – Points in shape (N, 3)
proj_mat (torch.Tensor | np.ndarray) – Transformation matrix between coordinates.
with_depth (bool, optional) – Whether to keep depth in the output. Defaults to False.
- Returns
- Points in image coordinates,
with shape [N, 2] if with_depth=False, else [N, 3].
- Return type
(torch.Tensor | np.ndarray)
- mmdet3d.core.bbox.points_img2cam(points, cam2img)[source]¶
Project points in image coordinates to camera coordinates.
- Parameters
points (torch.Tensor) – 2.5D points in 2D images, [N, 3], 3 corresponds with x, y in the image and depth.
cam2img (torch.Tensor) – Camera intrinsic matrix. The shape can be [3, 3], [3, 4] or [4, 4].
- Returns
- points in 3D space. [N, 3],
3 corresponds with x, y, z in 3D space.
- Return type
torch.Tensor
evaluation¶
- mmdet3d.core.evaluation.indoor_eval(gt_annos, dt_annos, metric, label2cat, logger=None, box_type_3d=None, box_mode_3d=None)[source]¶
Indoor Evaluation.
Evaluate the result of the detection.
- Parameters
gt_annos (list[dict]) – Ground truth annotations.
dt_annos (list[dict]) –
Detection annotations. the dict includes the following keys
labels_3d (torch.Tensor): Labels of boxes.
- boxes_3d (
BaseInstance3DBoxes
): 3D bounding boxes in Depth coordinate.
- boxes_3d (
scores_3d (torch.Tensor): Scores of boxes.
metric (list[float]) – IoU thresholds for computing average precisions.
label2cat (dict) – Map from label to category.
logger (logging.Logger | str, optional) – The way to print the mAP summary. See mmdet.utils.print_log() for details. Default: None.
- Returns
Dict of results.
- Return type
dict[str, float]
- mmdet3d.core.evaluation.instance_seg_eval(gt_semantic_masks, gt_instance_masks, pred_instance_masks, pred_instance_labels, pred_instance_scores, valid_class_ids, class_labels, options=None, logger=None)[source]¶
Instance Segmentation Evaluation.
Evaluate the result of the instance segmentation.
- Parameters
gt_semantic_masks (list[torch.Tensor]) – Ground truth semantic masks.
gt_instance_masks (list[torch.Tensor]) – Ground truth instance masks.
pred_instance_masks (list[torch.Tensor]) – Predicted instance masks.
pred_instance_labels (list[torch.Tensor]) – Predicted instance labels.
pred_instance_scores (list[torch.Tensor]) – Predicted instance labels.
valid_class_ids (tuple[int]) – Ids of valid categories.
class_labels (tuple[str]) – Names of valid categories.
options (dict, optional) – Additional options. Keys may contain: overlaps, min_region_sizes, distance_threshes, distance_confs. Default: None.
logger (logging.Logger | str, optional) – The way to print the mAP summary. See mmdet.utils.print_log() for details. Default: None.
- Returns
Dict of results.
- Return type
dict[str, float]
- mmdet3d.core.evaluation.kitti_eval(gt_annos, dt_annos, current_classes, eval_types=['bbox', 'bev', '3d'])[source]¶
KITTI evaluation.
- Parameters
gt_annos (list[dict]) – Contain gt information of each sample.
dt_annos (list[dict]) – Contain detected information of each sample.
current_classes (list[str]) – Classes to evaluation.
eval_types (list[str], optional) – Types to eval. Defaults to [‘bbox’, ‘bev’, ‘3d’].
- Returns
String and dict of evaluation results.
- Return type
tuple
- mmdet3d.core.evaluation.kitti_eval_coco_style(gt_annos, dt_annos, current_classes)[source]¶
coco style evaluation of kitti.
- Parameters
gt_annos (list[dict]) – Contain gt information of each sample.
dt_annos (list[dict]) – Contain detected information of each sample.
current_classes (list[str]) – Classes to evaluation.
- Returns
Evaluation results.
- Return type
string
- mmdet3d.core.evaluation.lyft_eval(lyft, data_root, res_path, eval_set, output_dir, logger=None)[source]¶
Evaluation API for Lyft dataset.
- Parameters
lyft (
LyftDataset
) – Lyft class in the sdk.data_root (str) – Root of data for reading splits.
res_path (str) – Path of result json file recording detections.
eval_set (str) – Name of the split for evaluation.
output_dir (str) – Output directory for output json files.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Default: None.
- Returns
The evaluation results.
- Return type
dict[str, float]
- mmdet3d.core.evaluation.seg_eval(gt_labels, seg_preds, label2cat, ignore_index, logger=None)[source]¶
Semantic Segmentation Evaluation.
Evaluate the result of the Semantic Segmentation.
- Parameters
gt_labels (list[torch.Tensor]) – Ground truth labels.
seg_preds (list[torch.Tensor]) – Predictions.
label2cat (dict) – Map from label to category name.
ignore_index (int) – Index that will be ignored in evaluation.
logger (logging.Logger | str, optional) – The way to print the mAP summary. See mmdet.utils.print_log() for details. Default: None.
- Returns
Dict of results.
- Return type
dict[str, float]
visualizer¶
- mmdet3d.core.visualizer.show_multi_modality_result(img, gt_bboxes, pred_bboxes, proj_mat, out_dir, filename, box_mode='lidar', img_metas=None, show=False, gt_bbox_color=(61, 102, 255), pred_bbox_color=(241, 101, 72))[source]¶
Convert multi-modality detection results into 2D results.
Project the predicted 3D bbox to 2D image plane and visualize them.
- Parameters
img (np.ndarray) – The numpy array of image in cv2 fashion.
gt_bboxes (
BaseInstance3DBoxes
) – Ground truth boxes.pred_bboxes (
BaseInstance3DBoxes
) – Predicted boxes.proj_mat (numpy.array, shape=[4, 4]) – The projection matrix according to the camera intrinsic parameters.
out_dir (str) – Path of output directory.
filename (str) – Filename of the current frame.
box_mode (str, optional) – Coordinate system the boxes are in. Should be one of ‘depth’, ‘lidar’ and ‘camera’. Defaults to ‘lidar’.
img_metas (dict, optional) – Used in projecting depth bbox. Defaults to None.
show (bool, optional) – Visualize the results online. Defaults to False.
gt_bbox_color (str or tuple(int), optional) – Color of bbox lines. The tuple of color should be in BGR order. Default: (255, 102, 61).
pred_bbox_color (str or tuple(int), optional) – Color of bbox lines. The tuple of color should be in BGR order. Default: (72, 101, 241).
- mmdet3d.core.visualizer.show_result(points, gt_bboxes, pred_bboxes, out_dir, filename, show=False, snapshot=False, pred_labels=None)[source]¶
Convert results into format that is directly readable for meshlab.
- Parameters
points (np.ndarray) – Points.
gt_bboxes (np.ndarray) – Ground truth boxes.
pred_bboxes (np.ndarray) – Predicted boxes.
out_dir (str) – Path of output directory
filename (str) – Filename of the current frame.
show (bool, optional) – Visualize the results online. Defaults to False.
snapshot (bool, optional) – Whether to save the online results. Defaults to False.
pred_labels (np.ndarray, optional) – Predicted labels of boxes. Defaults to None.
- mmdet3d.core.visualizer.show_seg_result(points, gt_seg, pred_seg, out_dir, filename, palette, ignore_index=None, show=False, snapshot=False)[source]¶
Convert results into format that is directly readable for meshlab.
- Parameters
points (np.ndarray) – Points.
gt_seg (np.ndarray) – Ground truth segmentation mask.
pred_seg (np.ndarray) – Predicted segmentation mask.
out_dir (str) – Path of output directory
filename (str) – Filename of the current frame.
palette (np.ndarray) – Mapping between class labels and colors.
ignore_index (int, optional) – The label index to be ignored, e.g. unannotated points. Defaults to None.
show (bool, optional) – Visualize the results online. Defaults to False.
snapshot (bool, optional) – Whether to save the online results. Defaults to False.
voxel¶
- class mmdet3d.core.voxel.VoxelGenerator(voxel_size, point_cloud_range, max_num_points, max_voxels=20000)[source]¶
Voxel generator in numpy implementation.
- Parameters
voxel_size (list[float]) – Size of a single voxel
point_cloud_range (list[float]) – Range of points
max_num_points (int) – Maximum number of points in a single voxel
max_voxels (int, optional) – Maximum number of voxels. Defaults to 20000.
- property grid_size¶
The size of grids.
- Type
np.ndarray
- property max_num_points_per_voxel¶
Maximum number of points per voxel.
- Type
int
- property point_cloud_range¶
Range of point cloud.
- Type
list[float]
- property voxel_size¶
Size of a single voxel.
- Type
list[float]
post_processing¶
- mmdet3d.core.post_processing.aligned_3d_nms(boxes, scores, classes, thresh)[source]¶
3D NMS for aligned boxes.
- Parameters
boxes (torch.Tensor) – Aligned box with shape [n, 6].
scores (torch.Tensor) – Scores of each box.
classes (torch.Tensor) – Class of each box.
thresh (float) – IoU threshold for nms.
- Returns
Indices of selected boxes.
- Return type
torch.Tensor
- mmdet3d.core.post_processing.box3d_multiclass_nms(mlvl_bboxes, mlvl_bboxes_for_nms, mlvl_scores, score_thr, max_num, cfg, mlvl_dir_scores=None, mlvl_attr_scores=None, mlvl_bboxes2d=None)[source]¶
Multi-class NMS for 3D boxes. The IoU used for NMS is defined as the 2D IoU between BEV boxes.
- Parameters
mlvl_bboxes (torch.Tensor) – Multi-level boxes with shape (N, M). M is the dimensions of boxes.
mlvl_bboxes_for_nms (torch.Tensor) – Multi-level boxes with shape (N, 5) ([x1, y1, x2, y2, ry]). N is the number of boxes. The coordinate system of the BEV boxes is counterclockwise.
mlvl_scores (torch.Tensor) – Multi-level boxes with shape (N, C + 1). N is the number of boxes. C is the number of classes.
score_thr (float) – Score threshold to filter boxes with low confidence.
max_num (int) – Maximum number of boxes will be kept.
cfg (dict) – Configuration dict of NMS.
mlvl_dir_scores (torch.Tensor, optional) – Multi-level scores of direction classifier. Defaults to None.
mlvl_attr_scores (torch.Tensor, optional) – Multi-level scores of attribute classifier. Defaults to None.
mlvl_bboxes2d (torch.Tensor, optional) – Multi-level 2D bounding boxes. Defaults to None.
- Returns
- Return results after nms, including 3D
bounding boxes, scores, labels, direction scores, attribute scores (optional) and 2D bounding boxes (optional).
- Return type
tuple[torch.Tensor]
- mmdet3d.core.post_processing.circle_nms(dets, thresh, post_max_size=83)[source]¶
Circular NMS.
An object is only counted as positive if no other center with a higher confidence exists within a radius r using a bird-eye view distance metric.
- Parameters
dets (torch.Tensor) – Detection results with the shape of [N, 3].
thresh (float) – Value of threshold.
post_max_size (int, optional) – Max number of prediction to be kept. Defaults to 83.
- Returns
Indexes of the detections to be kept.
- Return type
torch.Tensor
- mmdet3d.core.post_processing.merge_aug_bboxes(aug_bboxes, aug_scores, img_metas, rcnn_test_cfg)[source]¶
Merge augmented detection bboxes and scores.
- Parameters
aug_bboxes (list[Tensor]) – shape (n, 4*#class)
aug_scores (list[Tensor] or None) – shape (n, #class)
img_shapes (list[Tensor]) – shape (3, ).
rcnn_test_cfg (dict) – rcnn test config.
- Returns
(bboxes, scores)
- Return type
tuple
- mmdet3d.core.post_processing.merge_aug_bboxes_3d(aug_results, img_metas, test_cfg)[source]¶
Merge augmented detection 3D bboxes and scores.
- Parameters
aug_results (list[dict]) –
The dict of detection results. The dict contains the following keys
boxes_3d (
BaseInstance3DBoxes
): Detection bbox.scores_3d (torch.Tensor): Detection scores.
labels_3d (torch.Tensor): Predicted box labels.
img_metas (list[dict]) – Meta information of each sample.
test_cfg (dict) – Test config.
- Returns
Bounding boxes results in cpu mode, containing merged results.
boxes_3d (
BaseInstance3DBoxes
): Merged detection bbox.scores_3d (torch.Tensor): Merged detection scores.
labels_3d (torch.Tensor): Merged predicted box labels.
- Return type
dict
- mmdet3d.core.post_processing.merge_aug_masks(aug_masks, img_metas, rcnn_test_cfg, weights=None)[source]¶
Merge augmented mask prediction.
- Parameters
aug_masks (list[ndarray]) – shape (n, #class, h, w)
img_shapes (list[ndarray]) – shape (3, ).
rcnn_test_cfg (dict) – rcnn test config.
- Returns
(bboxes, scores)
- Return type
tuple
- mmdet3d.core.post_processing.merge_aug_proposals(aug_proposals, img_metas, cfg)[source]¶
Merge augmented proposals (multiscale, flip, etc.)
- Parameters
aug_proposals (list[Tensor]) – proposals from different testing schemes, shape (n, 5). Note that they are not rescaled to the original image size.
img_metas (list[dict]) – list of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmdet/datasets/pipelines/formatting.py:Collect.
cfg (dict) – rpn test config.
- Returns
shape (n, 4), proposals corresponding to original image scale.
- Return type
Tensor
- mmdet3d.core.post_processing.multiclass_nms(multi_bboxes, multi_scores, score_thr, nms_cfg, max_num=- 1, score_factors=None, return_inds=False)[source]¶
NMS for multi-class bboxes.
- Parameters
multi_bboxes (Tensor) – shape (n, #class*4) or (n, 4)
multi_scores (Tensor) – shape (n, #class), where the last column contains scores of the background class, but this will be ignored.
score_thr (float) – bbox threshold, bboxes with scores lower than it will not be considered.
nms_cfg (dict) – a dict that contains the arguments of nms operations
max_num (int, optional) – if there are more than max_num bboxes after NMS, only top max_num will be kept. Default to -1.
score_factors (Tensor, optional) – The factors multiplied to scores before applying NMS. Default to None.
return_inds (bool, optional) – Whether return the indices of kept bboxes. Default to False.
- Returns
- (dets, labels, indices (optional)), tensors of shape (k, 5),
(k), and (k). Dets are boxes with scores. Labels are 0-based.
- Return type
tuple
- mmdet3d.core.post_processing.nms_bev(boxes, scores, thresh, pre_max_size=None, post_max_size=None)[source]¶
NMS function GPU implementation (for BEV boxes). The overlap of two boxes for IoU calculation is defined as the exact overlapping area of the two boxes. In this function, one can also set
pre_max_size
andpost_max_size
.- Parameters
boxes (torch.Tensor) – Input boxes with the shape of [N, 5] ([x1, y1, x2, y2, ry]).
scores (torch.Tensor) – Scores of boxes with the shape of [N].
thresh (float) – Overlap threshold of NMS.
pre_max_size (int, optional) – Max size of boxes before NMS. Default: None.
post_max_size (int, optional) – Max size of boxes after NMS. Default: None.
- Returns
Indexes after NMS.
- Return type
torch.Tensor
- mmdet3d.core.post_processing.nms_normal_bev(boxes, scores, thresh)[source]¶
Normal NMS function GPU implementation (for BEV boxes). The overlap of two boxes for IoU calculation is defined as the exact overlapping area of the two boxes WITH their yaw angle set to 0.
- Parameters
boxes (torch.Tensor) – Input boxes with shape (N, 5).
scores (torch.Tensor) – Scores of predicted boxes with shape (N).
thresh (float) – Overlap threshold of NMS.
- Returns
Remaining indices with scores in descending order.
- Return type
torch.Tensor
mmdet3d.datasets¶
- class mmdet3d.datasets.AffineResize(img_scale, down_ratio, bbox_clip_border=True)[source]¶
Get the affine transform matrices to the target size.
Different from
RandomAffine
in MMDetection, this class can calculate the affine transform matrices while resizing the input image to a fixed size. The affine transform matrices include: 1) matrix transforming original image to the network input image size. 2) matrix transforming original image to the network output feature map size.- Parameters
img_scale (tuple) – Images scales for resizing.
down_ratio (int) – The down ratio of feature map. Actually the arg should be >= 1.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
- class mmdet3d.datasets.BackgroundPointsFilter(bbox_enlarge_range)[source]¶
Filter background points near the bounding box.
- Parameters
bbox_enlarge_range (tuple[float], float) – Bbox enlarge range.
- class mmdet3d.datasets.Custom3DDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, file_client_args={'backend': 'disk'})[source]¶
Customized 3D dataset.
This is the base dataset of SUNRGB-D, ScanNet, nuScenes, and KITTI dataset.
- [
- {‘sample_idx’:
- ‘lidar_points’: {‘lidar_path’: velodyne_path,
},
- ‘annos’: {‘box_type_3d’: (str) ‘LiDAR/Camera/Depth’
‘gt_bboxes_3d’: <np.ndarray> (n, 7) ‘gt_names’: [list] ….
}
‘calib’: { …..} ‘images’: { …..}
}
]
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’. Available options includes
’LiDAR’: Box in LiDAR coordinates.
’Depth’: Box in depth coordinates, usually for indoor dataset.
’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
- evaluate(results, metric=None, iou_thr=(0.25, 0.5), logger=None, show=False, out_dir=None, pipeline=None)[source]¶
Evaluate.
Evaluation in indoor protocol.
- Parameters
results (list[dict]) – List of results.
metric (str | list[str], optional) – Metrics to be evaluated. Defaults to None.
iou_thr (list[float]) – AP IoU thresholds. Defaults to (0.25, 0.5).
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Defaults to None.
show (bool, optional) – Whether to visualize. Default: False.
out_dir (str, optional) – Path to save the visualization results. Default: None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Evaluation results.
- Return type
dict
- format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]¶
Format the results to pkl file.
- Parameters
outputs (list[dict]) – Testing results of the dataset.
pklfile_prefix (str) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
- (outputs, tmp_dir), outputs is the detection results,
tmp_dir is the temporal directory created for saving json files when
jsonfile_prefix
is not specified.
- Return type
tuple
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
Annotation information consists of the following keys:
- gt_bboxes_3d (
LiDARInstance3DBoxes
): 3D ground truth bboxes
- gt_bboxes_3d (
gt_labels_3d (np.ndarray): Labels of ground truths.
gt_names (list[str]): Class names of ground truths.
- Return type
dict
- classmethod get_classes(classes=None)[source]¶
Get class names of current dataset.
- Parameters
classes (Sequence[str] | str) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.
- Returns
A list of class names.
- Return type
list[str]
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys:
sample_idx (str): Sample index.
pts_filename (str): Filename of point clouds.
file_name (str): Filename of point clouds.
ann_info (dict): Annotation info.
- Return type
dict
- load_annotations(ann_file)[source]¶
Load annotations from ann_file.
- Parameters
ann_file (str) – Path of the annotation file.
- Returns
List of annotations.
- Return type
list[dict]
- pre_pipeline(results)[source]¶
Initialization before data preparation.
- Parameters
results (dict) –
Dict before data preprocessing.
img_fields (list): Image fields.
bbox3d_fields (list): 3D bounding boxes fields.
pts_mask_fields (list): Mask fields of points.
pts_seg_fields (list): Mask fields of point segments.
bbox_fields (list): Fields of bounding boxes.
mask_fields (list): Fields of masks.
seg_fields (list): Segment fields.
box_type_3d (str): 3D box type.
box_mode_3d (str): 3D box mode.
- class mmdet3d.datasets.Custom3DSegDataset(data_root, ann_file, pipeline=None, classes=None, palette=None, modality=None, test_mode=False, ignore_index=None, scene_idxs=None, file_client_args={'backend': 'disk'})[source]¶
Customized 3D dataset for semantic segmentation task.
This is the base dataset of ScanNet and S3DIS dataset.
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
palette (list[list[int]], optional) – The palette of segmentation map. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
ignore_index (int, optional) – The label index to be ignored, e.g. unannotated points. If None is given, set to len(self.CLASSES) to be consistent with PointSegClassMapping function in pipeline. Defaults to None.
scene_idxs (np.ndarray | str, optional) – Precomputed index to load data. For scenes with many points, we may sample it several times. Defaults to None.
- evaluate(results, metric=None, logger=None, show=False, out_dir=None, pipeline=None)[source]¶
Evaluate.
Evaluation in semantic segmentation protocol.
- Parameters
results (list[dict]) – List of results.
metric (str | list[str]) – Metrics to be evaluated.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Defaults to None.
show (bool, optional) – Whether to visualize. Defaults to False.
out_dir (str, optional) – Path to save the visualization results. Defaults to None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Evaluation results.
- Return type
dict
- format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]¶
Format the results to pkl file.
- Parameters
outputs (list[dict]) – Testing results of the dataset.
pklfile_prefix (str) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
- (outputs, tmp_dir), outputs is the detection results,
tmp_dir is the temporal directory created for saving json files when
jsonfile_prefix
is not specified.
- Return type
tuple
- get_classes_and_palette(classes=None, palette=None)[source]¶
Get class names of current dataset.
This function is taken from MMSegmentation.
- Parameters
classes (Sequence[str] | str) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset. Defaults to None.
palette (Sequence[Sequence[int]]] | np.ndarray) – The palette of segmentation map. If None is given, random palette will be generated. Defaults to None.
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys:
sample_idx (str): Sample index.
pts_filename (str): Filename of point clouds.
file_name (str): Filename of point clouds.
ann_info (dict): Annotation info.
- Return type
dict
- get_scene_idxs(scene_idxs)[source]¶
Compute scene_idxs for data sampling.
We sample more times for scenes with more points.
- load_annotations(ann_file)[source]¶
Load annotations from ann_file.
- Parameters
ann_file (str) – Path of the annotation file.
- Returns
List of annotations.
- Return type
list[dict]
- pre_pipeline(results)[source]¶
Initialization before data preparation.
- Parameters
results (dict) –
Dict before data preprocessing.
img_fields (list): Image fields.
pts_mask_fields (list): Mask fields of points.
pts_seg_fields (list): Mask fields of point segments.
mask_fields (list): Fields of masks.
seg_fields (list): Segment fields.
- class mmdet3d.datasets.GlobalAlignment(rotation_axis)[source]¶
Apply global alignment to 3D scene points by rotation and translation.
- Parameters
rotation_axis (int) – Rotation axis for points and bboxes rotation.
Note
- We do not record the applied rotation and translation as in
GlobalRotScaleTrans. Because usually, we do not need to reverse the alignment step.
- For example, ScanNet 3D detection task uses aligned ground-truth
bounding boxes for evaluation.
- class mmdet3d.datasets.GlobalRotScaleTrans(rot_range=[- 0.78539816, 0.78539816], scale_ratio_range=[0.95, 1.05], translation_std=[0, 0, 0], shift_height=False)[source]¶
Apply global rotation, scaling and translation to a 3D scene.
- Parameters
rot_range (list[float], optional) – Range of rotation angle. Defaults to [-0.78539816, 0.78539816] (close to [-pi/4, pi/4]).
scale_ratio_range (list[float], optional) – Range of scale ratio. Defaults to [0.95, 1.05].
translation_std (list[float], optional) – The standard deviation of translation noise applied to a scene, which is sampled from a gaussian distribution whose standard deviation is set by
translation_std
. Defaults to [0, 0, 0]shift_height (bool, optional) – Whether to shift height. (the fourth dimension of indoor points) when scaling. Defaults to False.
- class mmdet3d.datasets.IndoorPatchPointSample(num_points, block_size=1.5, sample_rate=None, ignore_index=None, use_normalized_coord=False, num_try=10, enlarge_size=0.2, min_unique_num=None, eps=0.01)[source]¶
Indoor point sample within a patch. Modified from PointNet++.
Sampling data to a certain number for semantic segmentation.
- Parameters
num_points (int) – Number of points to be sampled.
block_size (float, optional) – Size of a block to sample points from. Defaults to 1.5.
sample_rate (float, optional) – Stride used in sliding patch generation. This parameter is unused in IndoorPatchPointSample and thus has been deprecated. We plan to remove it in the future. Defaults to None.
ignore_index (int, optional) – Label index that won’t be used for the segmentation task. This is set in PointSegClassMapping as neg_cls. If not None, will be used as a patch selection criterion. Defaults to None.
use_normalized_coord (bool, optional) – Whether to use normalized xyz as additional features. Defaults to False.
num_try (int, optional) – Number of times to try if the patch selected is invalid. Defaults to 10.
enlarge_size (float, optional) – Enlarge the sampled patch to [-block_size / 2 - enlarge_size, block_size / 2 + enlarge_size] as an augmentation. If None, set it as 0. Defaults to 0.2.
min_unique_num (int, optional) – Minimum number of unique points the sampled patch should contain. If None, use PointNet++’s method to judge uniqueness. Defaults to None.
eps (float, optional) – A value added to patch boundary to guarantee points coverage. Defaults to 1e-2.
Note
- This transform should only be used in the training process of point
cloud segmentation tasks. For the sliding patch generation and inference process in testing, please refer to the slide_inference function of EncoderDecoder3D class.
- class mmdet3d.datasets.IndoorPointSample(*args, **kwargs)[source]¶
Indoor point sample.
Sampling data to a certain number. NOTE: IndoorPointSample is deprecated in favor of PointSample
- Parameters
num_points (int) – Number of points to be sampled.
- class mmdet3d.datasets.KittiDataset(data_root, ann_file, split, pts_prefix='velodyne', pipeline=None, classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, pcd_limit_range=[0, - 40, - 3, 70.4, 40, 0.0], **kwargs)[source]¶
KITTI Dataset.
This class serves as the API for experiments on the KITTI Dataset.
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
split (str) – Split of input data.
pts_prefix (str, optional) – Prefix of points files. Defaults to ‘velodyne’.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes
’LiDAR’: Box in LiDAR coordinates.
’Depth’: Box in depth coordinates, usually for indoor dataset.
’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
pcd_limit_range (list, optional) – The range of point cloud used to filter invalid predicted boxes. Default: [0, -40, -3, 70.4, 40, 0.0].
- bbox2result_kitti(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]¶
Convert 3D detection results to kitti format for evaluation and test submission.
- Parameters
net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.
class_names (list[String]) – A list of class names.
pklfile_prefix (str) – The prefix of pkl file.
submission_prefix (str) – The prefix of submission file.
- Returns
A list of dictionaries with the kitti format.
- Return type
list[dict]
- bbox2result_kitti2d(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]¶
Convert 2D detection results to kitti format for evaluation and test submission.
- Parameters
net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.
class_names (list[String]) – A list of class names.
pklfile_prefix (str) – The prefix of pkl file.
submission_prefix (str) – The prefix of submission file.
- Returns
A list of dictionaries have the kitti format
- Return type
list[dict]
- convert_valid_bboxes(box_dict, info)[source]¶
Convert the predicted boxes into valid ones.
- Parameters
box_dict (dict) –
Box dictionaries to be converted.
boxes_3d (
LiDARInstance3DBoxes
): 3D bounding boxes.scores_3d (torch.Tensor): Scores of boxes.
labels_3d (torch.Tensor): Class labels of boxes.
info (dict) – Data info.
- Returns
Valid predicted boxes.
bbox (np.ndarray): 2D bounding boxes.
- box3d_camera (np.ndarray): 3D bounding boxes in
camera coordinate.
- box3d_lidar (np.ndarray): 3D bounding boxes in
LiDAR coordinate.
scores (np.ndarray): Scores of boxes.
label_preds (np.ndarray): Class label predictions.
sample_idx (int): Sample index.
- Return type
dict
- drop_arrays_by_name(gt_names, used_classes)[source]¶
Drop irrelevant ground truths by name.
- Parameters
gt_names (list[str]) – Names of ground truths.
used_classes (list[str]) – Classes of interest.
- Returns
Indices of ground truths that will be dropped.
- Return type
np.ndarray
- evaluate(results, metric=None, logger=None, pklfile_prefix=None, submission_prefix=None, show=False, out_dir=None, pipeline=None)[source]¶
Evaluation in KITTI protocol.
- Parameters
results (list[dict]) – Testing results of the dataset.
metric (str | list[str], optional) – Metrics to be evaluated. Default: None.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Default: None.
pklfile_prefix (str, optional) – The prefix of pkl files, including the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
submission_prefix (str, optional) – The prefix of submission data. If not specified, the submission data will not be generated. Default: None.
show (bool, optional) – Whether to visualize. Default: False.
out_dir (str, optional) – Path to save the visualization results. Default: None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Results of each evaluation metric.
- Return type
dict[str, float]
- format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]¶
Format the results to pkl file.
- Parameters
outputs (list[dict]) – Testing results of the dataset.
pklfile_prefix (str) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
submission_prefix (str) – The prefix of submitted files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
- (result_files, tmp_dir), result_files is a dict containing
the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.
- Return type
tuple
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
annotation information consists of the following keys:
- gt_bboxes_3d (
LiDARInstance3DBoxes
): 3D ground truth bboxes.
- gt_bboxes_3d (
gt_labels_3d (np.ndarray): Labels of ground truths.
gt_bboxes (np.ndarray): 2D ground truth bboxes.
gt_labels (np.ndarray): Labels of ground truths.
gt_names (list[str]): Class names of ground truths.
- difficulty (int): Difficulty defined by KITTI.
0, 1, 2 represent xxxxx respectively.
- Return type
dict
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys:
sample_idx (str): Sample index.
pts_filename (str): Filename of point clouds.
img_prefix (str): Prefix of image files.
img_info (dict): Image info.
- lidar2img (list[np.ndarray], optional): Transformations
from lidar to different cameras.
ann_info (dict): Annotation info.
- Return type
dict
- keep_arrays_by_name(gt_names, used_classes)[source]¶
Keep useful ground truths by name.
- Parameters
gt_names (list[str]) – Names of ground truths.
used_classes (list[str]) – Classes of interest.
- Returns
Indices of ground truths that will be keeped.
- Return type
np.ndarray
- remove_dontcare(ann_info)[source]¶
Remove annotations that do not need to be cared.
- Parameters
ann_info (dict) – Dict of annotation infos. The
'DontCare'
annotations will be removed according to ann_file[‘name’].- Returns
Annotations after filtering.
- Return type
dict
- show(results, out_dir, show=True, pipeline=None)[source]¶
Results visualization.
- Parameters
results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.
show (bool) – Whether to visualize the results online. Default: False.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- class mmdet3d.datasets.KittiMonoDataset(data_root, info_file, ann_file, pipeline, load_interval=1, with_velocity=False, eval_version=None, version=None, **kwargs)[source]¶
Monocular 3D detection on KITTI Dataset.
- Parameters
data_root (str) – Path of dataset root.
info_file (str) – Path of info file.
load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.
with_velocity (bool, optional) – Whether include velocity prediction into the experiments. Defaults to False.
eval_version (str, optional) – Configuration version of evaluation. Defaults to None.
version (str, optional) – Dataset version. Defaults to None.
kwargs (dict) – Other arguments are the same of NuScenesMonoDataset.
- bbox2result_kitti(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]¶
Convert 3D detection results to kitti format for evaluation and test submission.
- Parameters
net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.
class_names (list[String]) – A list of class names.
pklfile_prefix (str) – The prefix of pkl file.
submission_prefix (str) – The prefix of submission file.
- Returns
A list of dictionaries with the kitti format.
- Return type
list[dict]
- bbox2result_kitti2d(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]¶
Convert 2D detection results to kitti format for evaluation and test submission.
- Parameters
net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.
class_names (list[String]) – A list of class names.
pklfile_prefix (str) – The prefix of pkl file.
submission_prefix (str) – The prefix of submission file.
- Returns
A list of dictionaries have the kitti format
- Return type
list[dict]
- convert_valid_bboxes(box_dict, info)[source]¶
Convert the predicted boxes into valid ones.
- Parameters
box_dict (dict) – Box dictionaries to be converted. - boxes_3d (
CameraInstance3DBoxes
): 3D bounding boxes. - scores_3d (torch.Tensor): Scores of boxes. - labels_3d (torch.Tensor): Class labels of boxes.info (dict) – Data info.
- Returns
- Valid predicted boxes.
bbox (np.ndarray): 2D bounding boxes.
- box3d_camera (np.ndarray): 3D bounding boxes in
camera coordinate.
scores (np.ndarray): Scores of boxes.
label_preds (np.ndarray): Class label predictions.
sample_idx (int): Sample index.
- Return type
dict
- evaluate(results, metric=None, logger=None, pklfile_prefix=None, submission_prefix=None, show=False, out_dir=None, pipeline=None)[source]¶
Evaluation in KITTI protocol.
- Parameters
results (list[dict]) – Testing results of the dataset.
metric (str | list[str], optional) – Metrics to be evaluated. Defaults to None.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Default: None.
pklfile_prefix (str, optional) – The prefix of pkl files, including the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
submission_prefix (str, optional) – The prefix of submission data. If not specified, the submission data will not be generated.
show (bool, optional) – Whether to visualize. Default: False.
out_dir (str, optional) – Path to save the visualization results. Default: None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Results of each evaluation metric.
- Return type
dict[str, float]
- format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]¶
Format the results to pkl file.
- Parameters
outputs (list[dict]) – Testing results of the dataset.
pklfile_prefix (str) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
submission_prefix (str) – The prefix of submitted files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
- (result_files, tmp_dir), result_files is a dict containing
the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.
- Return type
tuple
- class mmdet3d.datasets.LoadAnnotations3D(with_bbox_3d=True, with_label_3d=True, with_attr_label=False, with_mask_3d=False, with_seg_3d=False, with_bbox=False, with_label=False, with_mask=False, with_seg=False, with_bbox_depth=False, poly2mask=True, seg_3d_dtype=<class 'numpy.int64'>, file_client_args={'backend': 'disk'})[source]¶
Load Annotations3D.
Load instance mask and semantic mask of points and encapsulate the items into related fields.
- Parameters
with_bbox_3d (bool, optional) – Whether to load 3D boxes. Defaults to True.
with_label_3d (bool, optional) – Whether to load 3D labels. Defaults to True.
with_attr_label (bool, optional) – Whether to load attribute label. Defaults to False.
with_mask_3d (bool, optional) – Whether to load 3D instance masks. for points. Defaults to False.
with_seg_3d (bool, optional) – Whether to load 3D semantic masks. for points. Defaults to False.
with_bbox (bool, optional) – Whether to load 2D boxes. Defaults to False.
with_label (bool, optional) – Whether to load 2D labels. Defaults to False.
with_mask (bool, optional) – Whether to load 2D instance masks. Defaults to False.
with_seg (bool, optional) – Whether to load 2D semantic masks. Defaults to False.
with_bbox_depth (bool, optional) – Whether to load 2.5D boxes. Defaults to False.
poly2mask (bool, optional) – Whether to convert polygon annotations to bitmasks. Defaults to True.
seg_3d_dtype (dtype, optional) – Dtype of 3D semantic masks. Defaults to int64
file_client_args (dict) – Config dict of file clients, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py for more details.
- class mmdet3d.datasets.LoadPointsFromDict(coord_type, load_dim=6, use_dim=[0, 1, 2], shift_height=False, use_color=False, file_client_args={'backend': 'disk'})[source]¶
Load Points From Dict.
- class mmdet3d.datasets.LoadPointsFromFile(coord_type, load_dim=6, use_dim=[0, 1, 2], shift_height=False, use_color=False, file_client_args={'backend': 'disk'})[source]¶
Load Points From File.
Load points from file.
- Parameters
coord_type (str) – The type of coordinates of points cloud. Available options includes: - ‘LIDAR’: Points in LiDAR coordinates. - ‘DEPTH’: Points in depth coordinates, usually for indoor dataset. - ‘CAMERA’: Points in camera coordinates.
load_dim (int, optional) – The dimension of the loaded points. Defaults to 6.
use_dim (list[int], optional) – Which dimensions of the points to use. Defaults to [0, 1, 2]. For KITTI dataset, set use_dim=4 or use_dim=[0, 1, 2, 3] to use the intensity dimension.
shift_height (bool, optional) – Whether to use shifted height. Defaults to False.
use_color (bool, optional) – Whether to use color features. Defaults to False.
file_client_args (dict, optional) – Config dict of file clients, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py for more details. Defaults to dict(backend=’disk’).
- class mmdet3d.datasets.LoadPointsFromMultiSweeps(sweeps_num=10, load_dim=5, use_dim=[0, 1, 2, 4], time_dim=4, file_client_args={'backend': 'disk'}, pad_empty_sweeps=False, remove_close=False, test_mode=False)[source]¶
Load points from multiple sweeps.
This is usually used for nuScenes dataset to utilize previous sweeps.
- Parameters
sweeps_num (int, optional) – Number of sweeps. Defaults to 10.
load_dim (int, optional) – Dimension number of the loaded points. Defaults to 5.
use_dim (list[int], optional) – Which dimension to use. Defaults to [0, 1, 2, 4].
time_dim (int, optional) – Which dimension to represent the timestamps of each points. Defaults to 4.
file_client_args (dict, optional) – Config dict of file clients, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py for more details. Defaults to dict(backend=’disk’).
pad_empty_sweeps (bool, optional) – Whether to repeat keyframe when sweeps is empty. Defaults to False.
remove_close (bool, optional) – Whether to remove close points. Defaults to False.
test_mode (bool, optional) – If test_mode=True, it will not randomly sample sweeps but select the nearest N frames. Defaults to False.
- class mmdet3d.datasets.LyftDataset(ann_file, pipeline=None, data_root=None, classes=None, load_interval=1, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, **kwargs)[source]¶
Lyft Dataset.
This class serves as the API for experiments on the Lyft Dataset.
Please refer to https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data for data downloading.
- Parameters
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
data_root (str) – Path of dataset root.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes
’LiDAR’: Box in LiDAR coordinates.
’Depth’: Box in depth coordinates, usually for indoor dataset.
’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
- evaluate(results, metric='bbox', logger=None, jsonfile_prefix=None, csv_savepath=None, result_names=['pts_bbox'], show=False, out_dir=None, pipeline=None)[source]¶
Evaluation in Lyft protocol.
- Parameters
results (list[dict]) – Testing results of the dataset.
metric (str | list[str], optional) – Metrics to be evaluated. Default: ‘bbox’.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Default: None.
jsonfile_prefix (str, optional) – The prefix of json files including the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
csv_savepath (str, optional) – The path for saving csv files. It includes the file path and the csv filename, e.g., “a/b/filename.csv”. If not specified, the result will not be converted to csv file.
result_names (list[str], optional) – Result names in the metric prefix. Default: [‘pts_bbox’].
show (bool, optional) – Whether to visualize. Default: False.
out_dir (str, optional) – Path to save the visualization results. Default: None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Evaluation results.
- Return type
dict[str, float]
- format_results(results, jsonfile_prefix=None, csv_savepath=None)[source]¶
Format the results to json (standard format for COCO evaluation).
- Parameters
results (list[dict]) – Testing results of the dataset.
jsonfile_prefix (str) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
csv_savepath (str) – The path for saving csv files. It includes the file path and the csv filename, e.g., “a/b/filename.csv”. If not specified, the result will not be converted to csv file.
- Returns
- Returns (result_files, tmp_dir), where result_files is a
dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.
- Return type
tuple
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
Annotation information consists of the following keys:
- gt_bboxes_3d (
LiDARInstance3DBoxes
): 3D ground truth bboxes.
- gt_bboxes_3d (
gt_labels_3d (np.ndarray): Labels of ground truths.
gt_names (list[str]): Class names of ground truths.
- Return type
dict
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys:
sample_idx (str): sample index
pts_filename (str): filename of point clouds
sweeps (list[dict]): infos of sweeps
timestamp (float): sample timestamp
img_filename (str, optional): image filename
- lidar2img (list[np.ndarray], optional): transformations
from lidar to different cameras
ann_info (dict): annotation info
- Return type
dict
- json2csv(json_path, csv_savepath)[source]¶
Convert the json file to csv format for submission.
- Parameters
json_path (str) – Path of the result json file.
csv_savepath (str) – Path to save the csv file.
- load_annotations(ann_file)[source]¶
Load annotations from ann_file.
- Parameters
ann_file (str) – Path of the annotation file.
- Returns
List of annotations sorted by timestamps.
- Return type
list[dict]
- show(results, out_dir, show=False, pipeline=None)[source]¶
Results visualization.
- Parameters
results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.
show (bool) – Whether to visualize the results online. Default: False.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- class mmdet3d.datasets.NormalizePointsColor(color_mean)[source]¶
Normalize color of points.
- Parameters
color_mean (list[float]) – Mean color of the point cloud.
- class mmdet3d.datasets.NuScenesDataset(ann_file, pipeline=None, data_root=None, classes=None, load_interval=1, with_velocity=True, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, eval_version='detection_cvpr_2019', use_valid_flag=False)[source]¶
NuScenes Dataset.
This class serves as the API for experiments on the NuScenes Dataset.
Please refer to NuScenes Dataset for data downloading.
- Parameters
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
data_root (str) – Path of dataset root.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.
with_velocity (bool, optional) – Whether include velocity prediction into the experiments. Defaults to True.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) – Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes. - ‘LiDAR’: Box in LiDAR coordinates. - ‘Depth’: Box in depth coordinates, usually for indoor dataset. - ‘Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
eval_version (bool, optional) – Configuration version of evaluation. Defaults to ‘detection_cvpr_2019’.
use_valid_flag (bool, optional) – Whether to use use_valid_flag key in the info file as mask to filter gt_boxes and gt_names. Defaults to False.
- evaluate(results, metric='bbox', logger=None, jsonfile_prefix=None, result_names=['pts_bbox'], show=False, out_dir=None, pipeline=None)[source]¶
Evaluation in nuScenes protocol.
- Parameters
results (list[dict]) – Testing results of the dataset.
metric (str | list[str], optional) – Metrics to be evaluated. Default: ‘bbox’.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Default: None.
jsonfile_prefix (str, optional) – The prefix of json files including the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
show (bool, optional) – Whether to visualize. Default: False.
out_dir (str, optional) – Path to save the visualization results. Default: None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Results of each evaluation metric.
- Return type
dict[str, float]
- format_results(results, jsonfile_prefix=None)[source]¶
Format the results to json (standard format for COCO evaluation).
- Parameters
results (list[dict]) – Testing results of the dataset.
jsonfile_prefix (str) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
- Returns (result_files, tmp_dir), where result_files is a
dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.
- Return type
tuple
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
Annotation information consists of the following keys:
- gt_bboxes_3d (
LiDARInstance3DBoxes
): 3D ground truth bboxes
- gt_bboxes_3d (
gt_labels_3d (np.ndarray): Labels of ground truths.
gt_names (list[str]): Class names of ground truths.
- Return type
dict
- get_cat_ids(idx)[source]¶
Get category distribution of single scene.
- Parameters
idx (int) – Index of the data_info.
- Returns
- for each category, if the current scene
contains such boxes, store a list containing idx, otherwise, store empty list.
- Return type
dict[list]
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys:
sample_idx (str): Sample index.
pts_filename (str): Filename of point clouds.
sweeps (list[dict]): Infos of sweeps.
timestamp (float): Sample timestamp.
img_filename (str, optional): Image filename.
- lidar2img (list[np.ndarray], optional): Transformations
from lidar to different cameras.
ann_info (dict): Annotation info.
- Return type
dict
- load_annotations(ann_file)[source]¶
Load annotations from ann_file.
- Parameters
ann_file (str) – Path of the annotation file.
- Returns
List of annotations sorted by timestamps.
- Return type
list[dict]
- show(results, out_dir, show=False, pipeline=None)[source]¶
Results visualization.
- Parameters
results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.
show (bool) – Whether to visualize the results online. Default: False.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- class mmdet3d.datasets.NuScenesMonoDataset(data_root, ann_file, pipeline, load_interval=1, with_velocity=True, modality=None, box_type_3d='Camera', eval_version='detection_cvpr_2019', use_valid_flag=False, version='v1.0-trainval', classes=None, img_prefix='', seg_prefix=None, proposal_file=None, test_mode=False, filter_empty_gt=True, file_client_args={'backend': 'disk'})[source]¶
Monocular 3D detection on NuScenes Dataset.
This class serves as the API for experiments on the NuScenes Dataset.
Please refer to NuScenes Dataset for data downloading.
- Parameters
ann_file (str) – Path of annotation file.
data_root (str) – Path of dataset root.
load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.
with_velocity (bool, optional) – Whether include velocity prediction into the experiments. Defaults to True.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) – Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Camera’ in this class. Available options includes. - ‘LiDAR’: Box in LiDAR coordinates. - ‘Depth’: Box in depth coordinates, usually for indoor dataset. - ‘Camera’: Box in camera coordinates.
eval_version (str, optional) – Configuration version of evaluation. Defaults to ‘detection_cvpr_2019’.
use_valid_flag (bool, optional) – Whether to use use_valid_flag key in the info file as mask to filter gt_boxes and gt_names. Defaults to False.
version (str, optional) – Dataset version. Defaults to ‘v1.0-trainval’.
- evaluate(results, metric='bbox', logger=None, jsonfile_prefix=None, result_names=['img_bbox'], show=False, out_dir=None, pipeline=None)[source]¶
Evaluation in nuScenes protocol.
- Parameters
results (list[dict]) – Testing results of the dataset.
metric (str | list[str], optional) – Metrics to be evaluated. Default: ‘bbox’.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Default: None.
jsonfile_prefix (str) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
result_names (list[str], optional) – Result names in the metric prefix. Default: [‘img_bbox’].
show (bool, optional) – Whether to visualize. Default: False.
out_dir (str, optional) – Path to save the visualization results. Default: None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Results of each evaluation metric.
- Return type
dict[str, float]
- format_results(results, jsonfile_prefix=None, **kwargs)[source]¶
Format the results to json (standard format for COCO evaluation).
- Parameters
results (list[tuple | numpy.ndarray]) – Testing results of the dataset.
jsonfile_prefix (str) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
- (result_files, tmp_dir), result_files is a dict containing
the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.
- Return type
tuple
- get_attr_name(attr_idx, label_name)[source]¶
Get attribute from predicted index.
This is a workaround to predict attribute when the predicted velocity is not reliable. We map the predicted attribute index to the one in the attribute set. If it is consistent with the category, we will keep it. Otherwise, we will use the default attribute.
- Parameters
attr_idx (int) – Attribute index.
label_name (str) – Predicted category name.
- Returns
Predicted attribute name.
- Return type
str
- pre_pipeline(results)[source]¶
Initialization before data preparation.
- Parameters
results (dict) –
Dict before data preprocessing.
img_fields (list): Image fields.
bbox3d_fields (list): 3D bounding boxes fields.
pts_mask_fields (list): Mask fields of points.
pts_seg_fields (list): Mask fields of point segments.
bbox_fields (list): Fields of bounding boxes.
mask_fields (list): Fields of masks.
seg_fields (list): Segment fields.
box_type_3d (str): 3D box type.
box_mode_3d (str): 3D box mode.
- show(results, out_dir, show=False, pipeline=None)[source]¶
Results visualization.
- Parameters
results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.
show (bool) – Whether to visualize the results online. Default: False.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- class mmdet3d.datasets.ObjectNameFilter(classes)[source]¶
Filter GT objects by their names.
- Parameters
classes (list[str]) – List of class names to be kept for training.
- class mmdet3d.datasets.ObjectNoise(translation_std=[0.25, 0.25, 0.25], global_rot_range=[0.0, 0.0], rot_range=[- 0.15707963267, 0.15707963267], num_try=100)[source]¶
Apply noise to each GT objects in the scene.
- Parameters
translation_std (list[float], optional) – Standard deviation of the distribution where translation noise are sampled from. Defaults to [0.25, 0.25, 0.25].
global_rot_range (list[float], optional) – Global rotation to the scene. Defaults to [0.0, 0.0].
rot_range (list[float], optional) – Object rotation range. Defaults to [-0.15707963267, 0.15707963267].
num_try (int, optional) – Number of times to try if the noise applied is invalid. Defaults to 100.
- class mmdet3d.datasets.ObjectRangeFilter(point_cloud_range)[source]¶
Filter objects by the range.
- Parameters
point_cloud_range (list[float]) – Point cloud range.
- class mmdet3d.datasets.ObjectSample(db_sampler, sample_2d=False, use_ground_plane=False)[source]¶
Sample GT objects to the data.
- Parameters
db_sampler (dict) – Config dict of the database sampler.
sample_2d (bool) – Whether to also paste 2D image patch to the images This should be true when applying multi-modality cut-and-paste. Defaults to False.
use_ground_plane (bool) – Whether to use gound plane to adjust the 3D labels.
- class mmdet3d.datasets.PointSample(num_points, sample_range=None, replace=False)[source]¶
Point sample.
Sampling data to a certain number.
- Parameters
num_points (int) – Number of points to be sampled.
sample_range (float, optional) – The range where to sample points. If not None, the points with depth larger than sample_range are prior to be sampled. Defaults to None.
replace (bool, optional) – Whether the sampling is with or without replacement. Defaults to False.
- class mmdet3d.datasets.PointsRangeFilter(point_cloud_range)[source]¶
Filter points by the range.
- Parameters
point_cloud_range (list[float]) – Point cloud range.
- class mmdet3d.datasets.RandomDropPointsColor(drop_ratio=0.2)[source]¶
Randomly set the color of points to all zeros.
Once this transform is executed, all the points’ color will be dropped. Refer to PAConv for more details.
- Parameters
drop_ratio (float, optional) – The probability of dropping point colors. Defaults to 0.2.
- class mmdet3d.datasets.RandomFlip3D(sync_2d=True, flip_ratio_bev_horizontal=0.0, flip_ratio_bev_vertical=0.0, **kwargs)[source]¶
Flip the points & bbox.
If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.
- Parameters
sync_2d (bool, optional) – Whether to apply flip according to the 2D images. If True, it will apply the same flip as that to 2D images. If False, it will decide whether to flip randomly and independently to that of 2D images. Defaults to True.
flip_ratio_bev_horizontal (float, optional) – The flipping probability in horizontal direction. Defaults to 0.0.
flip_ratio_bev_vertical (float, optional) – The flipping probability in vertical direction. Defaults to 0.0.
- random_flip_data_3d(input_dict, direction='horizontal')[source]¶
Flip 3D data randomly.
- Parameters
input_dict (dict) – Result dict from loading pipeline.
direction (str, optional) – Flip direction. Default: ‘horizontal’.
- Returns
- Flipped results, ‘points’, ‘bbox3d_fields’ keys are
updated in the result dict.
- Return type
dict
- class mmdet3d.datasets.RandomJitterPoints(jitter_std=[0.01, 0.01, 0.01], clip_range=[- 0.05, 0.05])[source]¶
Randomly jitter point coordinates.
- Different from the global translation in
GlobalRotScaleTrans
, here we apply different noises to each point in a scene.
- Parameters
jitter_std (list[float]) – The standard deviation of jittering noise. This applies random noise to all points in a 3D scene, which is sampled from a gaussian distribution whose standard deviation is set by
jitter_std
. Defaults to [0.01, 0.01, 0.01]clip_range (list[float]) – Clip the randomly generated jitter noise into this range. If None is given, don’t perform clipping. Defaults to [-0.05, 0.05]
Note
- This transform should only be used in point cloud segmentation tasks
because we don’t transform ground-truth bboxes accordingly.
For similar transform in detection task, please refer to ObjectNoise.
- Different from the global translation in
- class mmdet3d.datasets.RandomShiftScale(shift_scale, aug_prob)[source]¶
Random shift scale.
Different from the normal shift and scale function, it doesn’t directly shift or scale image. It can record the shift and scale infos into loading pipelines. It’s designed to be used with AffineResize together.
- Parameters
shift_scale (tuple[float]) – Shift and scale range.
aug_prob (float) – The shifting and scaling probability.
- class mmdet3d.datasets.S3DISDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='Depth', filter_empty_gt=True, test_mode=False, *kwargs)[source]¶
S3DIS Dataset for Detection Task.
This class is the inner dataset for S3DIS. Since S3DIS has 6 areas, we often train on 5 of them and test on the remaining one. The one for test is Area_5 as suggested in GSDN. To concatenate 5 areas during training mmdet.datasets.dataset_wrappers.ConcatDataset should be used.
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Depth’ in this dataset. Available options includes
’LiDAR’: Box in LiDAR coordinates.
’Depth’: Box in depth coordinates, usually for indoor dataset.
’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
annotation information consists of the following keys:
- gt_bboxes_3d (
DepthInstance3DBoxes
): 3D ground truth bboxes
- gt_bboxes_3d (
gt_labels_3d (np.ndarray): Labels of ground truths.
pts_instance_mask_path (str): Path of instance masks.
pts_semantic_mask_path (str): Path of semantic masks.
- Return type
dict
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys:
pts_filename (str): Filename of point clouds.
file_name (str): Filename of point clouds.
ann_info (dict): Annotation info.
- Return type
dict
- class mmdet3d.datasets.S3DISSegDataset(data_root, ann_files, pipeline=None, classes=None, palette=None, modality=None, test_mode=False, ignore_index=None, scene_idxs=None, **kwargs)[source]¶
S3DIS Dataset for Semantic Segmentation Task.
This class serves as the API for experiments on the S3DIS Dataset. It wraps the provided datasets of different areas. We don’t use mmdet.datasets.dataset_wrappers.ConcatDataset because we need to concat the scene_idxs of different areas.
Please refer to the google form for data downloading.
- Parameters
data_root (str) – Path of dataset root.
ann_files (list[str]) – Path of several annotation files.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
palette (list[list[int]], optional) – The palette of segmentation map. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
ignore_index (int, optional) – The label index to be ignored, e.g. unannotated points. If None is given, set to len(self.CLASSES). Defaults to None.
scene_idxs (list[np.ndarray] | list[str], optional) – Precomputed index to load data. For scenes with many points, we may sample it several times. Defaults to None.
- class mmdet3d.datasets.SUNRGBDDataset(data_root, ann_file, pipeline=None, classes=None, modality={'use_camera': True, 'use_lidar': True}, box_type_3d='Depth', filter_empty_gt=True, test_mode=False, **kwargs)[source]¶
SUNRGBD Dataset.
This class serves as the API for experiments on the SUNRGBD Dataset.
See the download page for data downloading.
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Depth’ in this dataset. Available options includes
’LiDAR’: Box in LiDAR coordinates.
’Depth’: Box in depth coordinates, usually for indoor dataset.
’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
- evaluate(results, metric=None, iou_thr=(0.25, 0.5), iou_thr_2d=(0.5), logger=None, show=False, out_dir=None, pipeline=None)[source]¶
Evaluate.
Evaluation in indoor protocol.
- Parameters
results (list[dict]) – List of results.
metric (str | list[str], optional) – Metrics to be evaluated. Default: None.
iou_thr (list[float], optional) – AP IoU thresholds for 3D evaluation. Default: (0.25, 0.5).
iou_thr_2d (list[float], optional) – AP IoU thresholds for 2D evaluation. Default: (0.5, ).
show (bool, optional) – Whether to visualize. Default: False.
out_dir (str, optional) – Path to save the visualization results. Default: None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Evaluation results.
- Return type
dict
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
annotation information consists of the following keys:
- gt_bboxes_3d (
DepthInstance3DBoxes
): 3D ground truth bboxes
- gt_bboxes_3d (
gt_labels_3d (np.ndarray): Labels of ground truths.
pts_instance_mask_path (str): Path of instance masks.
pts_semantic_mask_path (str): Path of semantic masks.
- Return type
dict
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys:
sample_idx (str): Sample index.
pts_filename (str, optional): Filename of point clouds.
file_name (str, optional): Filename of point clouds.
img_prefix (str, optional): Prefix of image files.
img_info (dict, optional): Image info.
calib (dict, optional): Camera calibration info.
ann_info (dict): Annotation info.
- Return type
dict
- show(results, out_dir, show=True, pipeline=None)[source]¶
Results visualization.
- Parameters
results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.
show (bool) – Visualize the results online.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- class mmdet3d.datasets.ScanNetDataset(data_root, ann_file, pipeline=None, classes=None, modality={'use_camera': False, 'use_depth': True}, box_type_3d='Depth', filter_empty_gt=True, test_mode=False, **kwargs)[source]¶
ScanNet Dataset for Detection Task.
This class serves as the API for experiments on the ScanNet Dataset.
Please refer to the github repo for data downloading.
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Depth’ in this dataset. Available options includes
’LiDAR’: Box in LiDAR coordinates.
’Depth’: Box in depth coordinates, usually for indoor dataset.
’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
annotation information consists of the following keys:
- gt_bboxes_3d (
DepthInstance3DBoxes
): 3D ground truth bboxes
- gt_bboxes_3d (
gt_labels_3d (np.ndarray): Labels of ground truths.
pts_instance_mask_path (str): Path of instance masks.
pts_semantic_mask_path (str): Path of semantic masks.
- axis_align_matrix (np.ndarray): Transformation matrix for
global scene alignment.
- Return type
dict
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys:
sample_idx (str): Sample index.
pts_filename (str): Filename of point clouds.
file_name (str): Filename of point clouds.
img_prefix (str, optional): Prefix of image files.
img_info (dict, optional): Image info.
ann_info (dict): Annotation info.
- Return type
dict
- prepare_test_data(index)[source]¶
Prepare data for testing.
- We should take axis_align_matrix from self.data_infos since we need
to align point clouds.
- Parameters
index (int) – Index for accessing the target data.
- Returns
Testing data dict of the corresponding index.
- Return type
dict
- show(results, out_dir, show=True, pipeline=None)[source]¶
Results visualization.
- Parameters
results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.
show (bool) – Visualize the results online.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- class mmdet3d.datasets.ScanNetInstanceSegDataset(data_root, ann_file, pipeline=None, classes=None, palette=None, modality=None, test_mode=False, ignore_index=None, scene_idxs=None, file_client_args={'backend': 'disk'})[source]¶
- evaluate(results, metric=None, options=None, logger=None, show=False, out_dir=None, pipeline=None)[source]¶
Evaluation in instance segmentation protocol.
- Parameters
results (list[dict]) – List of results.
metric (str | list[str]) – Metrics to be evaluated.
options (dict, optional) – options for instance_seg_eval.
logger (logging.Logger | None | str) – Logger used for printing related information during evaluation. Defaults to None.
show (bool, optional) – Whether to visualize. Defaults to False.
out_dir (str, optional) – Path to save the visualization results. Defaults to None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
Evaluation results.
- Return type
dict
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
- annotation information consists of the following keys:
pts_semantic_mask_path (str): Path of semantic masks.
pts_instance_mask_path (str): Path of instance masks.
- Return type
dict
- get_classes_and_palette(classes=None, palette=None)[source]¶
Get class names of current dataset. Palette is simply ignored for instance segmentation.
- Parameters
classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset. Defaults to None.
palette (Sequence[Sequence[int]]] | np.ndarray | None) – The palette of segmentation map. If None is given, random palette will be generated. Defaults to None.
- class mmdet3d.datasets.ScanNetSegDataset(data_root, ann_file, pipeline=None, classes=None, palette=None, modality=None, test_mode=False, ignore_index=None, scene_idxs=None, **kwargs)[source]¶
ScanNet Dataset for Semantic Segmentation Task.
This class serves as the API for experiments on the ScanNet Dataset.
Please refer to the github repo for data downloading.
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
palette (list[list[int]], optional) – The palette of segmentation map. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
ignore_index (int, optional) – The label index to be ignored, e.g. unannotated points. If None is given, set to len(self.CLASSES). Defaults to None.
scene_idxs (np.ndarray | str, optional) – Precomputed index to load data. For scenes with many points, we may sample it several times. Defaults to None.
- format_results(results, txtfile_prefix=None)[source]¶
Format the results to txt file. Refer to ScanNet documentation.
- Parameters
outputs (list[dict]) – Testing results of the dataset.
txtfile_prefix (str) – The prefix of saved files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
- Returns
- (outputs, tmp_dir), outputs is the detection results,
tmp_dir is the temporal directory created for saving submission files when
submission_prefix
is not specified.
- Return type
tuple
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
annotation information consists of the following keys:
pts_semantic_mask_path (str): Path of semantic masks.
- Return type
dict
- get_scene_idxs(scene_idxs)[source]¶
Compute scene_idxs for data sampling.
We sample more times for scenes with more points.
- show(results, out_dir, show=True, pipeline=None)[source]¶
Results visualization.
- Parameters
results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.
show (bool) – Visualize the results online.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- class mmdet3d.datasets.SemanticKITTIDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='Lidar', filter_empty_gt=False, test_mode=False)[source]¶
SemanticKITTI Dataset.
This class serves as the API for experiments on the SemanticKITTI Dataset Please refer to <http://www.semantic-kitti.org/dataset.html>`_ for data downloading
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
NO 3D box for this dataset. You can choose any type Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes
’LiDAR’: Box in LiDAR coordinates.
’Depth’: Box in depth coordinates, usually for indoor dataset.
’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
- get_ann_info(index)[source]¶
Get annotation info according to the given index.
- Parameters
index (int) – Index of the annotation data to get.
- Returns
annotation information consists of the following keys:
pts_semantic_mask_path (str): Path of semantic masks.
- Return type
dict
- get_data_info(index)[source]¶
Get data info according to the given index. :param index: Index of the sample data to get. :type index: int
- Returns
- Data information that will be passed to the data
preprocessing pipelines. It includes the following keys: - sample_idx (str): Sample index. - pts_filename (str): Filename of point clouds. - file_name (str): Filename of point clouds. - ann_info (dict): Annotation info.
- Return type
dict
- class mmdet3d.datasets.VoxelBasedPointSampler(cur_sweep_cfg, prev_sweep_cfg=None, time_dim=3)[source]¶
Voxel based point sampler.
Apply voxel sampling to multiple sweep points.
- Parameters
cur_sweep_cfg (dict) – Config for sampling current points.
prev_sweep_cfg (dict) – Config for sampling previous points.
time_dim (int) – Index that indicate the time dimension for input points.
- class mmdet3d.datasets.WaymoDataset(data_root, ann_file, split, pts_prefix='velodyne', pipeline=None, classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, load_interval=1, pcd_limit_range=[- 85, - 85, - 5, 85, 85, 5], **kwargs)[source]¶
Waymo Dataset.
This class serves as the API for experiments on the Waymo Dataset.
Please refer to `<https://waymo.com/open/download/>`_for data downloading. It is recommended to symlink the dataset root to $MMDETECTION3D/data and organize them as the doc shows.
- Parameters
data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
split (str) – Split of input data.
pts_prefix (str, optional) – Prefix of points files. Defaults to ‘velodyne’.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes
’LiDAR’: box in LiDAR coordinates
’Depth’: box in depth coordinates, usually for indoor dataset
’Camera’: box in camera coordinates
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
pcd_limit_range (list(float), optional) – The range of point cloud used to filter invalid predicted boxes. Default: [-85, -85, -5, 85, 85, 5].
- bbox2result_kitti(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]¶
Convert results to kitti format for evaluation and test submission.
- Parameters
net_outputs (List[np.ndarray]) – list of array storing the bbox and score
class_nanes (List[String]) – A list of class names
pklfile_prefix (str) – The prefix of pkl file.
submission_prefix (str) – The prefix of submission file.
- Returns
A list of dict have the kitti 3d format
- Return type
List[dict]
- convert_valid_bboxes(box_dict, info)[source]¶
Convert the boxes into valid format.
- Parameters
box_dict (dict) –
Bounding boxes to be converted.
boxes_3d (:obj:
LiDARInstance3DBoxes
): 3D bounding boxes.scores_3d (np.ndarray): Scores of predicted boxes.
labels_3d (np.ndarray): Class labels of predicted boxes.
info (dict) – Dataset information dictionary.
- Returns
Valid boxes after conversion.
bbox (np.ndarray): 2D bounding boxes (in camera 0).
box3d_camera (np.ndarray): 3D boxes in camera coordinates.
box3d_lidar (np.ndarray): 3D boxes in lidar coordinates.
scores (np.ndarray): Scores of predicted boxes.
label_preds (np.ndarray): Class labels of predicted boxes.
sample_idx (np.ndarray): Sample index.
- Return type
dict
- evaluate(results, metric='waymo', logger=None, pklfile_prefix=None, submission_prefix=None, show=False, out_dir=None, pipeline=None)[source]¶
Evaluation in KITTI protocol.
- Parameters
results (list[dict]) – Testing results of the dataset.
metric (str | list[str], optional) – Metrics to be evaluated. Default: ‘waymo’. Another supported metric is ‘kitti’.
logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Default: None.
pklfile_prefix (str, optional) – The prefix of pkl files including the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
submission_prefix (str, optional) – The prefix of submission data. If not specified, the submission data will not be generated.
show (bool, optional) – Whether to visualize. Default: False.
out_dir (str, optional) – Path to save the visualization results. Default: None.
pipeline (list[dict], optional) – raw data loading for showing. Default: None.
- Returns
float]: results of each evaluation metric
- Return type
dict[str
- format_results(outputs, pklfile_prefix=None, submission_prefix=None, data_format='waymo')[source]¶
Format the results to pkl file.
- Parameters
outputs (list[dict]) – Testing results of the dataset.
pklfile_prefix (str) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
submission_prefix (str) – The prefix of submitted files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
data_format (str, optional) – Output data format. Default: ‘waymo’. Another supported choice is ‘kitti’.
- Returns
- (result_files, tmp_dir), result_files is a dict containing
the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.
- Return type
tuple
- get_data_info(index)[source]¶
Get data info according to the given index.
- Parameters
index (int) – Index of the sample data to get.
- Returns
- Standard input_dict consists of the
data information.
sample_idx (str): sample index
pts_filename (str): filename of point clouds
img_prefix (str): prefix of image files
img_info (dict): image info
- lidar2img (list[np.ndarray], optional): transformations from
lidar to different cameras
ann_info (dict): annotation info
- Return type
dict
- mmdet3d.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, runner_type='EpochBasedRunner', persistent_workers=False, class_aware_sampler=None, **kwargs)[source]¶
Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.
- Parameters
dataset (Dataset) – A PyTorch dataset.
samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
seed (int, Optional) – Seed to be used. Default: None.
runner_type (str) – Type of runner. Default: EpochBasedRunner
persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. This argument is only valid when PyTorch>=1.7.0. Default: False.
class_aware_sampler (dict) – Whether to use ClassAwareSampler during training. Default: None.
kwargs – any keyword argument to be used to initialize DataLoader
- Returns
A PyTorch dataloader.
- Return type
DataLoader
- mmdet3d.datasets.get_loading_pipeline(pipeline)[source]¶
Only keep loading image, points and annotations related configuration.
- Parameters
pipeline (list[dict] | list[
Pipeline
]) – Data pipeline configs or list of pipeline functions.- Returns
- The new pipeline list with only
keep loading image, points and annotations related configuration.
- Return type
list[dict] | list[
Pipeline
])
Examples
>>> pipelines = [ ... dict(type='LoadPointsFromFile', ... coord_type='LIDAR', load_dim=4, use_dim=4), ... dict(type='LoadImageFromFile'), ... dict(type='LoadAnnotations3D', ... with_bbox=True, with_label_3d=True), ... dict(type='Resize', ... img_scale=[(640, 192), (2560, 768)], keep_ratio=True), ... dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), ... dict(type='PointsRangeFilter', ... point_cloud_range=point_cloud_range), ... dict(type='ObjectRangeFilter', ... point_cloud_range=point_cloud_range), ... dict(type='PointShuffle'), ... dict(type='Normalize', **img_norm_cfg), ... dict(type='Pad', size_divisor=32), ... dict(type='DefaultFormatBundle3D', class_names=class_names), ... dict(type='Collect3D', ... keys=['points', 'img', 'gt_bboxes_3d', 'gt_labels_3d']) ... ] >>> expected_pipelines = [ ... dict(type='LoadPointsFromFile', ... coord_type='LIDAR', load_dim=4, use_dim=4), ... dict(type='LoadImageFromFile'), ... dict(type='LoadAnnotations3D', ... with_bbox=True, with_label_3d=True), ... dict(type='DefaultFormatBundle3D', class_names=class_names), ... dict(type='Collect3D', ... keys=['points', 'img', 'gt_bboxes_3d', 'gt_labels_3d']) ... ] >>> assert expected_pipelines == ... get_loading_pipeline(pipelines)
mmdet3d.models¶
detectors¶
- class mmdet3d.models.detectors.Base3DDetector(init_cfg=None)[source]¶
Base class for detectors.
- forward(return_loss=True, **kwargs)[source]¶
Calls either forward_train or forward_test depending on whether return_loss=True.
Note this setting will change the expected inputs. When return_loss=True, img and img_metas are single-nested (i.e. torch.Tensor and list[dict]), and when resturn_loss=False, img and img_metas should be double nested (i.e. list[torch.Tensor], list[list[dict]]), with the outer list indicating test time augmentations.
- forward_test(points, img_metas, img=None, **kwargs)[source]¶
- Parameters
points (list[torch.Tensor]) – the outer list indicates test-time augmentations and inner torch.Tensor should have a shape NxC, which contains all points in the batch.
img_metas (list[list[dict]]) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch
img (list[torch.Tensor], optional) – the outer list indicates test-time augmentations and inner torch.Tensor should have a shape NxCxHxW, which contains all images in the batch. Defaults to None.
- show_results(data, result, out_dir, show=False, score_thr=None)[source]¶
Results visualization.
- Parameters
data (list[dict]) – Input points and the information of the sample.
result (list[dict]) – Prediction results.
out_dir (str) – Output directory of visualization result.
show (bool, optional) – Determines whether you are going to show result by open3d. Defaults to False.
score_thr (float, optional) – Score threshold of bounding boxes. Default to None.
- class mmdet3d.models.detectors.CenterPoint(pts_voxel_layer=None, pts_voxel_encoder=None, pts_middle_encoder=None, pts_fusion_layer=None, img_backbone=None, pts_backbone=None, img_neck=None, pts_neck=None, pts_bbox_head=None, img_roi_head=None, img_rpn_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
Base class of Multi-modality VoxelNet.
- aug_test_pts(feats, img_metas, rescale=False)[source]¶
Test function of point cloud branch with augmentaiton.
The function implementation process is as follows:
step 1: map features back for double-flip augmentation.
step 2: merge all features and generate boxes.
step 3: map boxes back for scale augmentation.
step 4: merge results.
- Parameters
feats (list[torch.Tensor]) – Feature of point cloud.
img_metas (list[dict]) – Meta information of samples.
rescale (bool, optional) – Whether to rescale bboxes. Default: False.
- Returns
Returned bboxes consists of the following keys:
boxes_3d (
LiDARInstance3DBoxes
): Predicted bboxes.scores_3d (torch.Tensor): Scores of predicted boxes.
labels_3d (torch.Tensor): Labels of predicted boxes.
- Return type
dict
- forward_pts_train(pts_feats, gt_bboxes_3d, gt_labels_3d, img_metas, gt_bboxes_ignore=None)[source]¶
Forward function for point cloud branch.
- Parameters
pts_feats (list[torch.Tensor]) – Features of point cloud branch
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth boxes for each sample.gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole
img_metas (list[dict]) – Meta information of samples.
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.
- Returns
Losses of each branch.
- Return type
dict
- property with_velocity¶
Whether the head predicts velocity
- Type
bool
- class mmdet3d.models.detectors.DynamicMVXFasterRCNN(**kwargs)[source]¶
Multi-modality VoxelNet using Faster R-CNN and dynamic voxelization.
- class mmdet3d.models.detectors.DynamicVoxelNet(voxel_layer, voxel_encoder, middle_encoder, backbone, neck=None, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
VoxelNet using dynamic voxelization.
- class mmdet3d.models.detectors.FCOSMono3D(backbone, neck, bbox_head, train_cfg=None, test_cfg=None, pretrained=None)[source]¶
FCOS3D for monocular 3D object detection.
Currently please refer to our entry on the leaderboard.
- class mmdet3d.models.detectors.GroupFree3DNet(backbone, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶
-
- forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, gt_bboxes_ignore=None)[source]¶
Forward of training.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
img_metas (list) – Image metas.
gt_bboxes_3d (
BaseInstance3DBoxes
) – gt bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.
pts_semantic_mask (list[torch.Tensor]) – point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – point-wise instance label of each batch.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
- Returns
torch.Tensor]: Losses.
- Return type
dict[str
- class mmdet3d.models.detectors.H3DNet(backbone, neck=None, rpn_head=None, roi_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
H3DNet model.
Please refer to the paper
- forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, gt_bboxes_ignore=None)[source]¶
Forward of training.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
img_metas (list) – Image metas.
gt_bboxes_3d (
BaseInstance3DBoxes
) – gt bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.
pts_semantic_mask (list[torch.Tensor]) – point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – point-wise instance label of each batch.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
- Returns
Losses.
- Return type
dict
- class mmdet3d.models.detectors.ImVoteNet(pts_backbone=None, pts_bbox_heads=None, pts_neck=None, img_backbone=None, img_neck=None, img_roi_head=None, img_rpn_head=None, img_mlp=None, freeze_img_branch=False, fusion_layer=None, num_sampled_seed=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
ImVoteNet for 3D detection.
- aug_test(points=None, img_metas=None, imgs=None, bboxes_2d=None, rescale=False, **kwargs)[source]¶
Test function with augmentation, stage 2.
- Parameters
points (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and the inner list contains all points in the batch, where each Tensor should have a shape NxC. Defaults to None.
img_metas (list[list[dict]], optional) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. Defaults to None.
imgs (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch. Defaults to None. Defaults to None.
bboxes_2d (list[list[torch.Tensor]], optional) – Provided 2d bboxes, not supported yet. Defaults to None.
rescale (bool, optional) – Whether or not rescale bboxes. Defaults to False.
- Returns
Predicted 3d boxes.
- Return type
list[dict]
- aug_test_img_only(img, img_metas, rescale=False)[source]¶
Test function with augmentation, image network pretrain. May refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/detectors/two_stage.py.
- Parameters
img (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch. Defaults to None. Defaults to None.
img_metas (list[list[dict]], optional) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. Defaults to None.
rescale (bool, optional) – Whether or not rescale bboxes to the original shape of input image. If rescale is False, then returned bboxes and masks will fit the scale of imgs[0]. Defaults to None.
- Returns
Predicted 2d boxes.
- Return type
list[list[torch.Tensor]]
- extract_bboxes_2d(img, img_metas, train=True, bboxes_2d=None, **kwargs)[source]¶
Extract bounding boxes from 2d detector.
- Parameters
img (torch.Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – Image meta info.
train (bool) – train-time or not.
bboxes_2d (list[torch.Tensor]) – provided 2d bboxes, not supported yet.
- Returns
a list of processed 2d bounding boxes.
- Return type
list[torch.Tensor]
- extract_img_feats(imgs)[source]¶
Extract features from multiple images.
- Parameters
imgs (list[torch.Tensor]) – A list of images. The images are augmented from the same image but in different ways.
- Returns
Features of different images
- Return type
list[torch.Tensor]
- forward_test(points=None, img_metas=None, img=None, bboxes_2d=None, **kwargs)[source]¶
Forwarding of test for image branch pretrain or stage 2 train.
- Parameters
points (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and the inner list contains all points in the batch, where each Tensor should have a shape NxC. Defaults to None.
img_metas (list[list[dict]], optional) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. Defaults to None.
img (list[list[torch.Tensor]], optional) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch. Defaults to None. Defaults to None.
bboxes_2d (list[list[torch.Tensor]], optional) – Provided 2d bboxes, not supported yet. Defaults to None.
- Returns
Predicted 2d or 3d boxes.
- Return type
list[list[torch.Tensor]]|list[dict]
- forward_train(points=None, img=None, img_metas=None, gt_bboxes=None, gt_labels=None, gt_bboxes_ignore=None, gt_masks=None, proposals=None, bboxes_2d=None, gt_bboxes_3d=None, gt_labels_3d=None, pts_semantic_mask=None, pts_instance_mask=None, **kwargs)[source]¶
Forwarding of train for image branch pretrain or stage 2 train.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
img (torch.Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
img_metas (list[dict]) – list of image and point cloud meta info dict. For example, keys include ‘ori_shape’, ‘img_norm_cfg’, and ‘transformation_3d_flow’. For details on the values of the keys see mmdet/datasets/pipelines/formatting.py:Collect.
gt_bboxes (list[torch.Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[torch.Tensor]) – class indices for each 2d bounding box.
gt_bboxes_ignore (list[torch.Tensor]) – specify which 2d bounding boxes can be ignored when computing the loss.
gt_masks (torch.Tensor) – true segmentation masks for each 2d bbox, used if the architecture supports a segmentation task.
proposals – override rpn proposals (2d) with custom proposals. Use when with_rpn is False.
bboxes_2d (list[torch.Tensor]) – provided 2d bboxes, not supported yet.
gt_bboxes_3d (
BaseInstance3DBoxes
) – 3d gt bboxes.gt_labels_3d (list[torch.Tensor]) – gt class labels for 3d bboxes.
pts_semantic_mask (list[torch.Tensor]) – point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – point-wise instance label of each batch.
- Returns
a dictionary of loss components.
- Return type
dict[str, torch.Tensor]
- simple_test(points=None, img_metas=None, img=None, bboxes_2d=None, rescale=False, **kwargs)[source]¶
Test without augmentation, stage 2.
- Parameters
points (list[torch.Tensor], optional) – Elements in the list should have a shape NxC, the list indicates all point-clouds in the batch. Defaults to None.
img_metas (list[dict], optional) – List indicates images in a batch. Defaults to None.
img (torch.Tensor, optional) – Should have a shape NxCxHxW, which contains all images in the batch. Defaults to None.
bboxes_2d (list[torch.Tensor], optional) – Provided 2d bboxes, not supported yet. Defaults to None.
rescale (bool, optional) – Whether or not rescale bboxes. Defaults to False.
- Returns
Predicted 3d boxes.
- Return type
list[dict]
- simple_test_img_only(img, img_metas, proposals=None, rescale=False)[source]¶
Test without augmentation, image network pretrain. May refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/detectors/two_stage.py.
- Parameters
img (torch.Tensor) – Should have a shape NxCxHxW, which contains all images in the batch.
img_metas (list[dict]) –
proposals (list[Tensor], optional) – override rpn proposals with custom proposals. Defaults to None.
rescale (bool, optional) – Whether or not rescale bboxes to the original shape of input image. Defaults to False.
- Returns
Predicted 2d boxes.
- Return type
list[list[torch.Tensor]]
- property with_img_backbone¶
Whether the detector has a 2D image backbone.
- Type
bool
- property with_img_bbox¶
Whether the detector has a 2D image box head.
- Type
bool
- property with_img_bbox_head¶
Whether the detector has a 2D image box head (not roi).
- Type
bool
- property with_img_neck¶
Whether the detector has a neck in image branch.
- Type
bool
- property with_img_roi_head¶
Whether the detector has a RoI Head in image branch.
- Type
bool
- property with_img_rpn¶
Whether the detector has a 2D RPN in image detector branch.
- Type
bool
- property with_pts_backbone¶
Whether the detector has a 3D backbone.
- Type
bool
- property with_pts_bbox¶
Whether the detector has a 3D box head.
- Type
bool
- property with_pts_neck¶
Whether the detector has a neck in 3D detector branch.
- Type
bool
- class mmdet3d.models.detectors.ImVoxelNet(backbone, neck, neck_3d, bbox_head, prior_generator, n_voxels, coord_type, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]¶
-
- Parameters
backbone (dict) – Config of the backbone.
neck (dict) – Config of the 2d neck.
neck_3d (dict) – Config of the 3d neck.
bbox_head (dict) – Config of the head.
prior_generator (dict) – Config of the prior generator.
n_voxels (tuple[int]) – Number of voxels for x, y, and z axis.
coord_type (str) – The type of coordinates of points cloud: ‘DEPTH’, ‘LIDAR’, or ‘CAMERA’.
train_cfg (dict, optional) – Config for train stage. Defaults to None.
test_cfg (dict, optional) – Config for test stage. Defaults to None.
init_cfg (dict, optional) – Config for weight initialization. Defaults to None.
pretrained (str, optional) – Deprecated initialization parameter. Defaults to None.
- aug_test(imgs, img_metas, **kwargs)[source]¶
Test with augmentations.
- Parameters
imgs (list[torch.Tensor]) – Input images of shape (N, C_in, H, W).
img_metas (list) – Image metas.
- Returns
Predicted 3d boxes.
- Return type
list[dict]
- extract_feat(img, img_metas)[source]¶
Extract 3d features from the backbone -> fpn -> 3d projection.
-> 3d neck -> bbox_head.
- Parameters
img (torch.Tensor) – Input images of shape (N, C_in, H, W).
img_metas (list) – Image metas.
- Returns
torch.Tensor: Features of shape (N, C_out, N_x, N_y, N_z).
torch.Tensor: Valid mask of shape (N, 1, N_x, N_y, N_z).
- Return type
Tuple
- forward_test(img, img_metas, **kwargs)[source]¶
Forward of testing.
- Parameters
img (torch.Tensor) – Input images of shape (N, C_in, H, W).
img_metas (list) – Image metas.
- Returns
Predicted 3d boxes.
- Return type
list[dict]
- forward_train(img, img_metas, gt_bboxes_3d, gt_labels_3d, **kwargs)[source]¶
Forward of training.
- Parameters
img (torch.Tensor) – Input images of shape (N, C_in, H, W).
img_metas (list) – Image metas.
gt_bboxes_3d (
BaseInstance3DBoxes
) – gt bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.
- Returns
A dictionary of loss components.
- Return type
dict[str, torch.Tensor]
- class mmdet3d.models.detectors.MVXFasterRCNN(**kwargs)[source]¶
Multi-modality VoxelNet using Faster R-CNN.
- class mmdet3d.models.detectors.MVXTwoStageDetector(pts_voxel_layer=None, pts_voxel_encoder=None, pts_middle_encoder=None, pts_fusion_layer=None, img_backbone=None, pts_backbone=None, img_neck=None, pts_neck=None, pts_bbox_head=None, img_roi_head=None, img_rpn_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
Base class of Multi-modality VoxelNet.
- aug_test_pts(feats, img_metas, rescale=False)[source]¶
Test function of point cloud branch with augmentaiton.
- extract_feats(points, img_metas, imgs=None)[source]¶
Extract point and image features of multiple samples.
- forward_img_train(x, img_metas, gt_bboxes, gt_labels, gt_bboxes_ignore=None, proposals=None, **kwargs)[source]¶
Forward function for image branch.
This function works similar to the forward function of Faster R-CNN.
- Parameters
x (list[torch.Tensor]) – Image features of shape (B, C, H, W) of multiple levels.
img_metas (list[dict]) – Meta information of images.
gt_bboxes (list[torch.Tensor]) – Ground truth boxes of each image sample.
gt_labels (list[torch.Tensor]) – Ground truth labels of boxes.
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.
proposals (list[torch.Tensor], optional) – Proposals of each sample. Defaults to None.
- Returns
Losses of each branch.
- Return type
dict
- forward_pts_train(pts_feats, gt_bboxes_3d, gt_labels_3d, img_metas, gt_bboxes_ignore=None)[source]¶
Forward function for point cloud branch.
- Parameters
pts_feats (list[torch.Tensor]) – Features of point cloud branch
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth boxes for each sample.gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole
img_metas (list[dict]) – Meta information of samples.
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.
- Returns
Losses of each branch.
- Return type
dict
- forward_train(points=None, img_metas=None, gt_bboxes_3d=None, gt_labels_3d=None, gt_labels=None, gt_bboxes=None, img=None, proposals=None, gt_bboxes_ignore=None)[source]¶
Forward training function.
- Parameters
points (list[torch.Tensor], optional) – Points of each sample. Defaults to None.
img_metas (list[dict], optional) – Meta information of each sample. Defaults to None.
gt_bboxes_3d (list[
BaseInstance3DBoxes
], optional) – Ground truth 3D boxes. Defaults to None.gt_labels_3d (list[torch.Tensor], optional) – Ground truth labels of 3D boxes. Defaults to None.
gt_labels (list[torch.Tensor], optional) – Ground truth labels of 2D boxes in images. Defaults to None.
gt_bboxes (list[torch.Tensor], optional) – Ground truth 2D boxes in images. Defaults to None.
img (torch.Tensor, optional) – Images of each sample with shape (N, C, H, W). Defaults to None.
proposals ([list[torch.Tensor], optional) – Predicted proposals used for training Fast RCNN. Defaults to None.
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth 2D boxes in images to be ignored. Defaults to None.
- Returns
Losses of different branches.
- Return type
dict
- show_results(data, result, out_dir)[source]¶
Results visualization.
- Parameters
data (dict) – Input points and the information of the sample.
result (dict) – Prediction results.
out_dir (str) – Output directory of visualization result.
- simple_test(points, img_metas, img=None, rescale=False)[source]¶
Test function without augmentaiton.
- voxelize(points)[source]¶
Apply dynamic voxelization to points.
- Parameters
points (list[torch.Tensor]) – Points of each sample.
- Returns
- Concatenated points, number of points
per voxel, and coordinates.
- Return type
tuple[torch.Tensor]
- property with_fusion¶
Whether the detector has a fusion layer.
- Type
bool
- property with_img_backbone¶
Whether the detector has a 2D image backbone.
- Type
bool
- property with_img_bbox¶
Whether the detector has a 2D image box head.
- Type
bool
- property with_img_neck¶
Whether the detector has a neck in image branch.
- Type
bool
- property with_img_roi_head¶
Whether the detector has a RoI Head in image branch.
- Type
bool
- property with_img_rpn¶
Whether the detector has a 2D RPN in image detector branch.
- Type
bool
Whether the detector has a shared head in image branch.
- Type
bool
- property with_middle_encoder¶
Whether the detector has a middle encoder.
- Type
bool
- property with_pts_backbone¶
Whether the detector has a 3D backbone.
- Type
bool
- property with_pts_bbox¶
Whether the detector has a 3D box head.
- Type
bool
- property with_pts_neck¶
Whether the detector has a neck in 3D detector branch.
- Type
bool
- property with_voxel_encoder¶
Whether the detector has a voxel encoder.
- Type
bool
- class mmdet3d.models.detectors.MinkSingleStage3DDetector(backbone, head, voxel_size, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]¶
Single stage detector based on MinkowskiEngine GSDN.
- Parameters
backbone (dict) – Config of the backbone.
head (dict) – Config of the head.
voxel_size (float) – Voxel size in meters.
train_cfg (dict, optional) – Config for train stage. Defaults to None.
test_cfg (dict, optional) – Config for test stage. Defaults to None.
init_cfg (dict, optional) – Config for weight initialization. Defaults to None.
pretrained (str, optional) – Deprecated initialization parameter. Defaults to None.
- aug_test(points, img_metas, **kwargs)[source]¶
Test with augmentations.
- Parameters
points (list[list[torch.Tensor]]) – Points of each sample.
img_metas (list[dict]) – Contains scene meta infos.
- Returns
Predicted 3d boxes.
- Return type
list[dict]
- extract_feat(points)[source]¶
Extract features from points.
- Parameters
points (list[Tensor]) – Raw point clouds.
- Returns
Voxelized point clouds.
- Return type
SparseTensor
- forward_train(points, gt_bboxes_3d, gt_labels_3d, img_metas)[source]¶
Forward of training.
- Parameters
points (list[Tensor]) – Raw point clouds.
gt_bboxes (list[BaseInstance3DBoxes]) – Ground truth bboxes of each sample.
gt_labels (list[torch.Tensor]) – Labels of each sample.
img_metas (list[dict]) – Contains scene meta infos.
- Returns
Centerness, bbox and classification loss values.
- Return type
dict
- class mmdet3d.models.detectors.PartA2(voxel_layer, voxel_encoder, middle_encoder, backbone, neck=None, rpn_head=None, roi_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
Part-A2 detector.
Please refer to the paper
- forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, gt_bboxes_ignore=None, proposals=None)[source]¶
Training forward function.
- Parameters
points (list[torch.Tensor]) – Point cloud of each sample.
img_metas (list[dict]) – Meta information of each sample
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth boxes for each sample.gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.
- Returns
Losses of each branch.
- Return type
dict
- class mmdet3d.models.detectors.PointRCNN(backbone, neck=None, rpn_head=None, roi_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
PointRCNN detector.
Please refer to the PointRCNN
- Parameters
backbone (dict) – Config dict of detector’s backbone.
neck (dict, optional) – Config dict of neck. Defaults to None.
rpn_head (dict, optional) – Config of RPN head. Defaults to None.
roi_head (dict, optional) – Config of ROI head. Defaults to None.
train_cfg (dict, optional) – Train configs. Defaults to None.
test_cfg (dict, optional) – Test configs. Defaults to None.
pretrained (str, optional) – Model pretrained path. Defaults to None.
init_cfg (dict, optional) – Config of initialization. Defaults to None.
- extract_feat(points)[source]¶
Directly extract features from the backbone+neck.
- Parameters
points (torch.Tensor) – Input points.
- Returns
Features from the backbone+neck
- Return type
dict
- forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d)[source]¶
Forward of training.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
img_metas (list[dict]) – Meta information of each sample.
gt_bboxes_3d (
BaseInstance3DBoxes
) – gt bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.
- Returns
Losses.
- Return type
dict
- simple_test(points, img_metas, imgs=None, rescale=False)[source]¶
Forward of testing.
- Parameters
points (list[torch.Tensor]) – Points of each sample.
img_metas (list[dict]) – Image metas.
imgs (list[torch.Tensor], optional) – Images of each sample. Defaults to None.
rescale (bool, optional) – Whether to rescale results. Defaults to False.
- Returns
Predicted 3d boxes.
- Return type
list
- class mmdet3d.models.detectors.SASSD(voxel_layer, voxel_encoder, middle_encoder, backbone, neck=None, bbox_head=None, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]¶
SASSD <https://github.com/skyhehe123/SA-SSD> _ for 3D detection.
- forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, gt_bboxes_ignore=None)[source]¶
Training forward function.
- Parameters
points (list[torch.Tensor]) – Point cloud of each sample.
img_metas (list[dict]) – Meta information of each sample
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth boxes for each sample.gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.
- Returns
Losses of each branch.
- Return type
dict
- class mmdet3d.models.detectors.SMOKEMono3D(backbone, neck, bbox_head, train_cfg=None, test_cfg=None, pretrained=None)[source]¶
SMOKE <https://arxiv.org/abs/2002.10111>`_ for monocular 3D object detection.
- class mmdet3d.models.detectors.SSD3DNet(backbone, bbox_head=None, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]¶
3DSSDNet model.
- class mmdet3d.models.detectors.SingleStageMono3DDetector(backbone, neck=None, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
Base class for monocular 3D single-stage detectors.
Single-stage detectors directly and densely predict bounding boxes on the output features of the backbone+neck.
- forward_train(img, img_metas, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels=None, gt_bboxes_ignore=None)[source]¶
- Parameters
img (Tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
img_metas (list[dict]) – A List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see
mmdet.datasets.pipelines.Collect
.gt_bboxes (list[Tensor]) – Each item are the truth boxes for each image in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – Class indices corresponding to each box
gt_bboxes_3d (list[Tensor]) – Each item are the 3D truth boxes for each image in [x, y, z, x_size, y_size, z_size, yaw, vx, vy] format.
gt_labels_3d (list[Tensor]) – 3D class indices corresponding to each box.
centers2d (list[Tensor]) – Projected 3D centers onto 2D images.
depths (list[Tensor]) – Depth of projected centers on 2D images.
attr_labels (list[Tensor], optional) – Attribute indices corresponding to each box
gt_bboxes_ignore (list[Tensor]) – Specify which bounding boxes can be ignored when computing the loss.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- show_results(data, result, out_dir, show=False, score_thr=None)[source]¶
Results visualization.
- Parameters
data (list[dict]) – Input images and the information of the sample.
result (list[dict]) – Prediction results.
out_dir (str) – Output directory of visualization result.
show (bool, optional) – Determines whether you are going to show result by open3d. Defaults to False.
TODO – implement score_thr of single_stage_mono3d.
score_thr (float, optional) – Score threshold of bounding boxes. Default to None. Not implemented yet, but it is here for unification.
- simple_test(img, img_metas, rescale=False)[source]¶
Test function without test time augmentation.
- Parameters
imgs (list[torch.Tensor]) – List of multiple images
img_metas (list[dict]) – List of image information.
rescale (bool, optional) – Whether to rescale the results. Defaults to False.
- Returns
- BBox results of each image and classes.
The outer list corresponds to each image. The inner list corresponds to each class.
- Return type
list[list[np.ndarray]]
- class mmdet3d.models.detectors.VoteNet(backbone, bbox_head=None, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]¶
VoteNet for 3D detection.
- forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, gt_bboxes_ignore=None)[source]¶
Forward of training.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
img_metas (list) – Image metas.
gt_bboxes_3d (
BaseInstance3DBoxes
) – gt bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – gt class labels of each batch.
pts_semantic_mask (list[torch.Tensor]) – point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – point-wise instance label of each batch.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
- Returns
Losses.
- Return type
dict
- class mmdet3d.models.detectors.VoxelNet(voxel_layer, voxel_encoder, middle_encoder, backbone, neck=None, bbox_head=None, train_cfg=None, test_cfg=None, init_cfg=None, pretrained=None)[source]¶
VoxelNet for 3D detection.
- forward_train(points, img_metas, gt_bboxes_3d, gt_labels_3d, gt_bboxes_ignore=None)[source]¶
Training forward function.
- Parameters
points (list[torch.Tensor]) – Point cloud of each sample.
img_metas (list[dict]) – Meta information of each sample
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth boxes for each sample.gt_labels_3d (list[torch.Tensor]) – Ground truth labels for boxes of each sampole
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored. Defaults to None.
- Returns
Losses of each branch.
- Return type
dict
backbones¶
- class mmdet3d.models.backbones.DGCNNBackbone(in_channels, num_samples=(20, 20, 20), knn_modes=('D-KNN', 'F-KNN', 'F-KNN'), radius=(None, None, None), gf_channels=((64, 64), (64, 64), (64)), fa_channels=(1024), act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
Backbone network for DGCNN.
- Parameters
in_channels (int) – Input channels of point cloud.
num_samples (tuple[int], optional) – The number of samples for knn or ball query in each graph feature (GF) module. Defaults to (20, 20, 20).
knn_modes (tuple[str], optional) – Mode of KNN of each knn module. Defaults to (‘D-KNN’, ‘F-KNN’, ‘F-KNN’).
radius (tuple[float], optional) – Sampling radii of each GF module. Defaults to (None, None, None).
gf_channels (tuple[tuple[int]], optional) – Out channels of each mlp in GF module. Defaults to ((64, 64), (64, 64), (64, )).
fa_channels (tuple[int], optional) – Out channels of each mlp in FA module. Defaults to (1024, ).
act_cfg (dict, optional) – Config of activation layer. Defaults to dict(type=’ReLU’).
init_cfg (dict, optional) – Initialization config. Defaults to None.
- forward(points)[source]¶
Forward pass.
- Parameters
points (torch.Tensor) – point coordinates with features, with shape (B, N, in_channels).
- Returns
- Outputs after graph feature (GF) and
feature aggregation (FA) modules.
gf_points (list[torch.Tensor]): Outputs after each GF module.
fa_points (torch.Tensor): Outputs after FA module.
- Return type
dict[str, list[torch.Tensor]]
- class mmdet3d.models.backbones.DLANet(depth, in_channels=3, out_indices=(0, 1, 2, 3, 4, 5), frozen_stages=- 1, norm_cfg=None, conv_cfg=None, layer_with_level_root=(False, True, True, True), with_identity_root=False, pretrained=None, init_cfg=None)[source]¶
-
- Parameters
depth (int) – Depth of DLA. Default: 34.
in_channels (int, optional) – Number of input image channels. Default: 3.
norm_cfg (dict, optional) – Dictionary to construct and config norm layer. Default: None.
conv_cfg (dict, optional) – Dictionary to construct and config conv layer. Default: None.
layer_with_level_root (list[bool], optional) – Whether to apply level_root in each DLA layer, this is only used for tree levels. Default: (False, True, True, True).
with_identity_root (bool, optional) – Whether to add identity in root layer. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmdet3d.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=True, with_cp=False, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]¶
HRNet backbone.
High-Resolution Representations for Labeling Pixels and Regions arXiv:.
- Parameters
extra (dict) –
Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:
num_modules(int): The number of HRModule in this stage.
num_branches(int): The number of branches in the HRModule.
block(str): The type of convolution block.
- num_blocks(tuple): The number of blocks in each branch.
The length must be equal to num_branches.
- num_channels(tuple): The number of channels in each branch.
The length must be equal to num_branches.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – Dictionary to construct and config conv layer.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: True.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.
multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.
pretrained (str, optional) – Model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> from mmdet.models import HRNet >>> import torch >>> extra = dict( >>> stage1=dict( >>> num_modules=1, >>> num_branches=1, >>> block='BOTTLENECK', >>> num_blocks=(4, ), >>> num_channels=(64, )), >>> stage2=dict( >>> num_modules=1, >>> num_branches=2, >>> block='BASIC', >>> num_blocks=(4, 4), >>> num_channels=(32, 64)), >>> stage3=dict( >>> num_modules=4, >>> num_branches=3, >>> block='BASIC', >>> num_blocks=(4, 4, 4), >>> num_channels=(32, 64, 128)), >>> stage4=dict( >>> num_modules=3, >>> num_branches=4, >>> block='BASIC', >>> num_blocks=(4, 4, 4, 4), >>> num_channels=(32, 64, 128, 256))) >>> self = HRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 32, 8, 8) (1, 64, 4, 4) (1, 128, 2, 2) (1, 256, 1, 1)
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- property norm2¶
the normalization layer named “norm2”
- Type
nn.Module
- class mmdet3d.models.backbones.MinkResNet(depth, in_channels, num_stages=4, pool=True)[source]¶
Minkowski ResNet backbone. See 4D Spatio-Temporal ConvNets for more details.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (ont) – Number of input channels, 3 for RGB.
num_stages (int, optional) – Resnet stages. Default: 4.
pool (bool, optional) – Add max pooling after first conv if True. Default: True.
- class mmdet3d.models.backbones.MultiBackbone(num_streams, backbones, aggregation_mlp_channels=None, conv_cfg={'type': 'Conv1d'}, norm_cfg={'eps': 1e-05, 'momentum': 0.01, 'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, suffixes=('net0', 'net1'), init_cfg=None, pretrained=None, **kwargs)[source]¶
MultiBackbone with different configs.
- Parameters
num_streams (int) – The number of backbones.
backbones (list or dict) – A list of backbone configs.
aggregation_mlp_channels (list[int]) – Specify the mlp layers for feature aggregation.
conv_cfg (dict) – Config dict of convolutional layers.
norm_cfg (dict) – Config dict of normalization layers.
act_cfg (dict) – Config dict of activation layers.
suffixes (list) – A list of suffixes to rename the return dict for each backbone.
- forward(points)[source]¶
Forward pass.
- Parameters
points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).
- Returns
Outputs from multiple backbones.
fp_xyz[suffix] (list[torch.Tensor]): The coordinates of each fp features.
fp_features[suffix] (list[torch.Tensor]): The features from each Feature Propagate Layers.
fp_indices[suffix] (list[torch.Tensor]): Indices of the input points.
hd_feature (torch.Tensor): The aggregation feature from multiple backbones.
- Return type
dict[str, list[torch.Tensor]]
- class mmdet3d.models.backbones.NoStemRegNet(arch, init_cfg=None, **kwargs)[source]¶
RegNet backbone without Stem for 3D detection.
More details can be found in paper .
- Parameters
arch (dict) – The parameter of RegNets. - w0 (int): Initial width. - wa (float): Slope of width. - wm (float): Quantization parameter to quantize the width. - depth (int): Depth of the backbone. - group_w (int): Width of group. - bot_mul (float): Bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Normally 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.
Example
>>> from mmdet3d.models import NoStemRegNet >>> import torch >>> self = NoStemRegNet( arch=dict( w0=88, wa=26.31, wm=2.25, group_w=48, depth=25, bot_mul=1.0)) >>> self.eval() >>> inputs = torch.rand(1, 64, 16, 16) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 96, 8, 8) (1, 192, 4, 4) (1, 432, 2, 2) (1, 1008, 1, 1)
- class mmdet3d.models.backbones.PointNet2SAMSG(in_channels, num_points=(2048, 1024, 512, 256), radii=((0.2, 0.4, 0.8), (0.4, 0.8, 1.6), (1.6, 3.2, 4.8)), num_samples=((32, 32, 64), (32, 32, 64), (32, 32, 32)), sa_channels=(((16, 16, 32), (16, 16, 32), (32, 32, 64)), ((64, 64, 128), (64, 64, 128), (64, 96, 128)), ((128, 128, 256), (128, 192, 256), (128, 256, 256))), aggregation_channels=(64, 128, 256), fps_mods=('D-FPS', 'FS', ('F-FPS', 'D-FPS')), fps_sample_range_lists=(- 1, - 1, (512, - 1)), dilated_group=(True, True, True), out_indices=(2), norm_cfg={'type': 'BN2d'}, sa_cfg={'normalize_xyz': False, 'pool_mod': 'max', 'type': 'PointSAModuleMSG', 'use_xyz': True}, init_cfg=None)[source]¶
PointNet2 with Multi-scale grouping.
- Parameters
in_channels (int) – Input channels of point cloud.
num_points (tuple[int]) – The number of points which each SA module samples.
radii (tuple[float]) – Sampling radii of each SA module.
num_samples (tuple[int]) – The number of samples for ball query in each SA module.
sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.
aggregation_channels (tuple[int]) – Out channels of aggregation multi-scale grouping features.
fps_mods (tuple[int]) – Mod of FPS for each SA module.
fps_sample_range_lists (tuple[tuple[int]]) – The number of sampling points which each SA module samples.
dilated_group (tuple[bool]) – Whether to use dilated ball query for
out_indices (Sequence[int]) – Output from which stages.
norm_cfg (dict) – Config of normalization layer.
sa_cfg (dict) –
Config of set abstraction module, which may contain the following keys and values:
pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.
use_xyz (bool): Whether to use xyz as a part of features.
normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.
- forward(points)[source]¶
Forward pass.
- Parameters
points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).
- Returns
Outputs of the last SA module.
sa_xyz (torch.Tensor): The coordinates of sa features.
- sa_features (torch.Tensor): The features from the
last Set Aggregation Layers.
- sa_indices (torch.Tensor): Indices of the
input points.
- Return type
dict[str, torch.Tensor]
- class mmdet3d.models.backbones.PointNet2SASSG(in_channels, num_points=(2048, 1024, 512, 256), radius=(0.2, 0.4, 0.8, 1.2), num_samples=(64, 32, 16, 16), sa_channels=((64, 64, 128), (128, 128, 256), (128, 128, 256), (128, 128, 256)), fp_channels=((256, 256), (256, 256)), norm_cfg={'type': 'BN2d'}, sa_cfg={'normalize_xyz': True, 'pool_mod': 'max', 'type': 'PointSAModule', 'use_xyz': True}, init_cfg=None)[source]¶
PointNet2 with Single-scale grouping.
- Parameters
in_channels (int) – Input channels of point cloud.
num_points (tuple[int]) – The number of points which each SA module samples.
radius (tuple[float]) – Sampling radii of each SA module.
num_samples (tuple[int]) – The number of samples for ball query in each SA module.
sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.
fp_channels (tuple[tuple[int]]) – Out channels of each mlp in FP module.
norm_cfg (dict) – Config of normalization layer.
sa_cfg (dict) –
Config of set abstraction module, which may contain the following keys and values:
pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.
use_xyz (bool): Whether to use xyz as a part of features.
normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.
- forward(points)[source]¶
Forward pass.
- Parameters
points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).
- Returns
Outputs after SA and FP modules.
- fp_xyz (list[torch.Tensor]): The coordinates of
each fp features.
- fp_features (list[torch.Tensor]): The features
from each Feature Propagate Layers.
- fp_indices (list[torch.Tensor]): Indices of the
input points.
- Return type
dict[str, list[torch.Tensor]]
- class mmdet3d.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]¶
ResNeXt backbone.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
groups (int) – Group of resnext.
base_width (int) – Base width of resnext.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
- class mmdet3d.models.backbones.ResNet(depth, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]¶
ResNet backbone.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
stem_channels (int | None) – Number of stem channels. If not specified, it will be the same as base_channels. Default: None.
base_channels (int) – Number of base channels of res layer. Default: 64.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
cfg (dict, required): Cfg dict to build plugin.
position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.
stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
Example
>>> from mmdet.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- make_stage_plugins(plugins, stage_idx)[source]¶
Make plugins for ResNet
stage_idx
th stage.Currently we support to insert
context_block
,empirical_attention_block
,nonlocal_block
into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.An example of plugins format could be:
Examples
>>> plugins=[ ... dict(cfg=dict(type='xxx', arg1='xxx'), ... stages=(False, True, True, True), ... position='after_conv2'), ... dict(cfg=dict(type='yyy'), ... stages=(True, True, True, True), ... position='after_conv3'), ... dict(cfg=dict(type='zzz', postfix='1'), ... stages=(True, True, True, True), ... position='after_conv3'), ... dict(cfg=dict(type='zzz', postfix='2'), ... stages=(True, True, True, True), ... position='after_conv3') ... ] >>> self = ResNet(depth=18) >>> stage_plugins = self.make_stage_plugins(plugins, 0) >>> assert len(stage_plugins) == 3
Suppose
stage_idx=0
, the structure of blocks in the stage would be:conv1-> conv2->conv3->yyy->zzz1->zzz2
Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:
conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2
If stages is missing, the plugin would be applied to all stages.
- Parameters
plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.
stage_idx (int) – Index of stage to build
- Returns
Plugins for current stage
- Return type
list[dict]
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- class mmdet3d.models.backbones.ResNetV1d(**kwargs)[source]¶
ResNetV1d variant described in Bag of Tricks.
Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.
- class mmdet3d.models.backbones.SECOND(in_channels=128, out_channels=[128, 128, 256], layer_nums=[3, 5, 5], layer_strides=[2, 2, 2], norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, conv_cfg={'bias': False, 'type': 'Conv2d'}, init_cfg=None, pretrained=None)[source]¶
Backbone network for SECOND/PointPillars/PartA2/MVXNet.
- Parameters
in_channels (int) – Input channels.
out_channels (list[int]) – Output channels for multi-scale feature maps.
layer_nums (list[int]) – Number of layers in each stage.
layer_strides (list[int]) – Strides of each stage.
norm_cfg (dict) – Config dict of normalization layers.
conv_cfg (dict) – Config dict of convolutional layers.
- class mmdet3d.models.backbones.SSDVGG(depth, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), pretrained=None, init_cfg=None, input_size=None, l2_norm_scale=None)[source]¶
VGG Backbone network for single-shot-detection.
- Parameters
depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_last_pool (bool) – Whether to add a pooling layer at the last of the model
ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.
out_indices (Sequence[int]) – Output from which stages.
out_feature_indices (Sequence[int]) – Output from which feature map.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
input_size (int, optional) – Deprecated argumment. Width and height of input, from {300, 512}.
l2_norm_scale (float, optional) – Deprecated argumment. L2 normalization layer init scale.
Example
>>> self = SSDVGG(input_size=300, depth=11) >>> self.eval() >>> inputs = torch.rand(1, 3, 300, 300) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 1024, 19, 19) (1, 512, 10, 10) (1, 256, 5, 5) (1, 256, 3, 3) (1, 256, 1, 1)
necks¶
- class mmdet3d.models.necks.DLANeck(in_channels=[16, 32, 64, 128, 256, 512], start_level=2, end_level=5, norm_cfg=None, use_dcn=True, init_cfg=None)[source]¶
DLA Neck.
- Parameters
in_channels (list[int], optional) – List of input channels of multi-scale feature map.
start_level (int, optional) – The scale level where upsampling starts. Default: 2.
end_level (int, optional) – The scale level where upsampling ends. Default: 5.
norm_cfg (dict, optional) – Config dict for normalization layer. Default: None.
use_dcn (bool, optional) – Whether to use dcn in IDAup module. Default: True.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmdet3d.models.necks.FPN(in_channels, out_channels, num_outs, start_level=0, end_level=- 1, add_extra_convs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, upsample_cfg={'mode': 'nearest'}, init_cfg={'distribution': 'uniform', 'layer': 'Conv2d', 'type': 'Xavier'})[source]¶
Feature Pyramid Network.
This is an implementation of paper Feature Pyramid Networks for Object Detection.
- Parameters
in_channels (list[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
num_outs (int) – Number of output scales.
start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.
end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.
add_extra_convs (bool | str) –
If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, it is equivalent to add_extra_convs=’on_input’. If str, it specifies the source feature map of the extra convs. Only the following options are allowed
’on_input’: Last feat map of neck inputs (i.e. backbone feature).
’on_lateral’: Last feature map after lateral convs.
’on_output’: The last output feature map after fpn convs.
relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.
no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(mode=’nearest’).
init_cfg (dict or list[dict], optional) – Initialization config dict.
Example
>>> import torch >>> in_channels = [2, 3, 5, 7] >>> scales = [340, 170, 84, 43] >>> inputs = [torch.rand(1, c, s, s) ... for c, s in zip(in_channels, scales)] >>> self = FPN(in_channels, 11, len(in_channels)).eval() >>> outputs = self.forward(inputs) >>> for i in range(len(outputs)): ... print(f'outputs[{i}].shape = {outputs[i].shape}') outputs[0].shape = torch.Size([1, 11, 340, 340]) outputs[1].shape = torch.Size([1, 11, 170, 170]) outputs[2].shape = torch.Size([1, 11, 84, 84]) outputs[3].shape = torch.Size([1, 11, 43, 43])
- class mmdet3d.models.necks.IndoorImVoxelNeck(in_channels, out_channels, n_blocks)[source]¶
Neck for ImVoxelNet outdoor scenario.
- Parameters
in_channels (int) – Number of channels in an input tensor.
out_channels (int) – Number of channels in all output tensors.
n_blocks (list[int]) – Number of blocks for each feature level.
- class mmdet3d.models.necks.OutdoorImVoxelNeck(in_channels, out_channels)[source]¶
Neck for ImVoxelNet outdoor scenario.
- Parameters
in_channels (int) – Number of channels in an input tensor.
out_channels (int) – Number of channels in all output tensors.
- class mmdet3d.models.necks.PointNetFPNeck(fp_channels, init_cfg=None)[source]¶
PointNet FP Module used in PointRCNN.
Refer to the official code.
sa_n ---------------------------------------- | ... --------------------------------- | | | sa_1 ------------- | | | | | sa_0 -> fp_0 -> fp_module ->fp_1 -> ... -> fp_module -> fp_n
sa_n including sa_xyz (torch.Tensor) and sa_features (torch.Tensor) fp_n including fp_xyz (torch.Tensor) and fp_features (torch.Tensor)
- Parameters
fp_channels (tuple[tuple[int]]) – Tuple of mlp channels in FP modules.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(feat_dict)[source]¶
Forward pass.
- Parameters
feat_dict (dict) – Feature dict from backbone.
- Returns
Outputs of the Neck.
fp_xyz (torch.Tensor): The coordinates of fp features.
- fp_features (torch.Tensor): The features from the last
feature propagation layers.
- Return type
dict[str, torch.Tensor]
- class mmdet3d.models.necks.SECONDFPN(in_channels=[128, 128, 256], out_channels=[256, 256, 256], upsample_strides=[1, 2, 4], norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, upsample_cfg={'bias': False, 'type': 'deconv'}, conv_cfg={'bias': False, 'type': 'Conv2d'}, use_conv_for_no_stride=False, init_cfg=None)[source]¶
FPN used in SECOND/PointPillars/PartA2/MVXNet.
- Parameters
in_channels (list[int]) – Input channels of multi-scale feature maps.
out_channels (list[int]) – Output channels of feature maps.
upsample_strides (list[int]) – Strides used to upsample the feature maps.
norm_cfg (dict) – Config dict of normalization layers.
upsample_cfg (dict) – Config dict of upsample layers.
conv_cfg (dict) – Config dict of conv layers.
use_conv_for_no_stride (bool) – Whether to use conv when stride is 1.
dense_heads¶
- class mmdet3d.models.dense_heads.Anchor3DHead(num_classes, in_channels, train_cfg, test_cfg, feat_channels=256, use_direction_classifier=True, anchor_generator={'custom_values': [], 'range': [0, - 39.68, - 1.78, 69.12, 39.68, - 1.78], 'reshape_out': False, 'rotations': [0, 1.57], 'sizes': [[3.9, 1.6, 1.56]], 'strides': [2], 'type': 'Anchor3DRangeGenerator'}, assigner_per_size=False, assign_per_class=False, diff_rad_by_sin=True, dir_offset=- 1.5707963267948966, dir_limit_offset=0, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 0.2, 'type': 'CrossEntropyLoss'}, init_cfg=None)[source]¶
Anchor head for SECOND/PointPillars/MVXNet/PartA2.
- Parameters
num_classes (int) – Number of classes.
in_channels (int) – Number of channels in the input feature map.
train_cfg (dict) – Train configs.
test_cfg (dict) – Test configs.
feat_channels (int) – Number of channels of the feature map.
use_direction_classifier (bool) – Whether to add a direction classifier.
anchor_generator (dict) – Config dict of anchor generator.
assigner_per_size (bool) – Whether to do assignment for each separate anchor size.
assign_per_class (bool) – Whether to do assignment for each class.
diff_rad_by_sin (bool) – Whether to change the difference into sin difference for box regression loss.
dir_offset (float | int) – The offset of BEV rotation angles. (TODO: may be moved into box coder)
dir_limit_offset (float | int) – The limited range of BEV rotation angles. (TODO: may be moved into box coder)
bbox_coder (dict) – Config dict of box coders.
loss_cls (dict) – Config of classification loss.
loss_bbox (dict) – Config of localization loss.
loss_dir (dict) – Config of direction classifier loss.
- static add_sin_difference(boxes1, boxes2)[source]¶
Convert the rotation difference to difference in sine function.
- Parameters
boxes1 (torch.Tensor) – Original Boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.
boxes2 (torch.Tensor) – Target boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.
- Returns
boxes1
andboxes2
whose 7thdimensions are changed.
- Return type
tuple[torch.Tensor]
- forward(feats)[source]¶
Forward pass.
- Parameters
feats (list[torch.Tensor]) – Multi-level features, e.g., features produced by FPN.
- Returns
- Multi-level class score, bbox
and direction predictions.
- Return type
tuple[list[torch.Tensor]]
- forward_single(x)[source]¶
Forward function on a single-scale feature map.
- Parameters
x (torch.Tensor) – Input features.
- Returns
- Contain score of each class, bbox
regression and direction classification predictions.
- Return type
tuple[torch.Tensor]
- get_anchors(featmap_sizes, input_metas, device='cuda')[source]¶
Get anchors according to feature map sizes.
- Parameters
featmap_sizes (list[tuple]) – Multi-level feature map sizes.
input_metas (list[dict]) – contain pcd and img’s meta info.
device (str) – device of current module.
- Returns
- Anchors of each image, valid flags
of each image.
- Return type
list[list[torch.Tensor]]
- get_bboxes(cls_scores, bbox_preds, dir_cls_preds, input_metas, cfg=None, rescale=False)[source]¶
Get bboxes of anchor head.
- Parameters
cls_scores (list[torch.Tensor]) – Multi-level class scores.
bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.
input_metas (list[dict]) – Contain pcd and img’s meta info.
cfg (
ConfigDict
) – Training or testing config.rescale (list[torch.Tensor]) – Whether th rescale bbox.
- Returns
Prediction resultes of batches.
- Return type
list[tuple]
- get_bboxes_single(cls_scores, bbox_preds, dir_cls_preds, mlvl_anchors, input_meta, cfg=None, rescale=False)[source]¶
Get bboxes of single branch.
- Parameters
cls_scores (torch.Tensor) – Class score in single batch.
bbox_preds (torch.Tensor) – Bbox prediction in single batch.
dir_cls_preds (torch.Tensor) – Predictions of direction class in single batch.
mlvl_anchors (List[torch.Tensor]) – Multi-level anchors in single batch.
input_meta (list[dict]) – Contain pcd and img’s meta info.
cfg (
ConfigDict
) – Training or testing config.rescale (list[torch.Tensor]) – whether th rescale bbox.
- Returns
Contain predictions of single batch.
bboxes (
BaseInstance3DBoxes
): Predicted 3d bboxes.scores (torch.Tensor): Class score of each bbox.
labels (torch.Tensor): Label of each bbox.
- Return type
tuple
- loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]¶
Calculate losses.
- Parameters
cls_scores (list[torch.Tensor]) – Multi-level class scores.
bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.
gt_bboxes (list[
BaseInstance3DBoxes
]) – Gt bboxes of each sample.gt_labels (list[torch.Tensor]) – Gt labels of each sample.
input_metas (list[dict]) – Contain pcd and img’s meta info.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding boxes to ignore.
- Returns
- Classification, bbox, and
direction losses of each level.
loss_cls (list[torch.Tensor]): Classification losses.
loss_bbox (list[torch.Tensor]): Box regression losses.
- loss_dir (list[torch.Tensor]): Direction classification
losses.
- Return type
dict[str, list[torch.Tensor]]
- loss_single(cls_score, bbox_pred, dir_cls_preds, labels, label_weights, bbox_targets, bbox_weights, dir_targets, dir_weights, num_total_samples)[source]¶
Calculate loss of Single-level results.
- Parameters
cls_score (torch.Tensor) – Class score in single-level.
bbox_pred (torch.Tensor) – Bbox prediction in single-level.
dir_cls_preds (torch.Tensor) – Predictions of direction class in single-level.
labels (torch.Tensor) – Labels of class.
label_weights (torch.Tensor) – Weights of class loss.
bbox_targets (torch.Tensor) – Targets of bbox predictions.
bbox_weights (torch.Tensor) – Weights of bbox loss.
dir_targets (torch.Tensor) – Targets of direction predictions.
dir_weights (torch.Tensor) – Weights of direction loss.
num_total_samples (int) – The number of valid samples.
- Returns
- Losses of class, bbox
and direction, respectively.
- Return type
tuple[torch.Tensor]
- class mmdet3d.models.dense_heads.AnchorFreeMono3DHead(num_classes, in_channels, feat_channels=256, stacked_convs=4, strides=(4, 8, 16, 32, 64), dcn_on_last_conv=False, conv_bias='auto', background_label=None, use_direction_classifier=True, diff_rad_by_sin=True, dir_offset=0, dir_limit_offset=0, loss_cls={'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'FocalLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, loss_attr={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, bbox_code_size=9, pred_attrs=False, num_attrs=9, pred_velo=False, pred_bbox2d=False, group_reg_dims=(2, 1, 3, 1, 2), cls_branch=(128, 64), reg_branch=((128, 64), (128, 64), (64), (64), ()), dir_branch=(64), attr_branch=(64), conv_cfg=None, norm_cfg=None, train_cfg=None, test_cfg=None, init_cfg=None)[source]¶
Anchor-free head for monocular 3D object detection.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
feat_channels (int, optional) – Number of hidden channels. Used in child classes. Defaults to 256.
stacked_convs (int, optional) – Number of stacking convs of the head.
strides (tuple, optional) – Downsample factor of each feature map.
dcn_on_last_conv (bool, optional) – If true, use dcn in the last layer of towers. Default: False.
conv_bias (bool | str, optional) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Default: ‘auto’.
background_label (int, optional) – Label ID of background, set as 0 for RPN and num_classes for other heads. It will automatically set as num_classes if None is given.
use_direction_classifier (bool, optional) – Whether to add a direction classifier.
diff_rad_by_sin (bool, optional) – Whether to change the difference into sin difference for box regression loss. Defaults to True.
dir_offset (float, optional) – Parameter used in direction classification. Defaults to 0.
dir_limit_offset (float, optional) – Parameter used in direction classification. Defaults to 0.
loss_cls (dict, optional) – Config of classification loss.
loss_bbox (dict, optional) – Config of localization loss.
loss_dir (dict, optional) – Config of direction classifier loss.
loss_attr (dict, optional) – Config of attribute classifier loss, which is only active when pred_attrs=True.
bbox_code_size (int, optional) – Dimensions of predicted bounding boxes.
pred_attrs (bool, optional) – Whether to predict attributes. Defaults to False.
num_attrs (int, optional) – The number of attributes to be predicted. Default: 9.
pred_velo (bool, optional) – Whether to predict velocity. Defaults to False.
pred_bbox2d (bool, optional) – Whether to predict 2D boxes. Defaults to False.
group_reg_dims (tuple[int], optional) – The dimension of each regression target group. Default: (2, 1, 3, 1, 2).
cls_branch (tuple[int], optional) – Channels for classification branch. Default: (128, 64).
reg_branch (tuple[tuple], optional) –
Channels for regression branch. Default: (
(128, 64), # offset (128, 64), # depth (64, ), # size (64, ), # rot () # velo
),
dir_branch (tuple[int], optional) – Channels for direction classification branch. Default: (64, ).
attr_branch (tuple[int], optional) – Channels for classification branch. Default: (64, ).
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None.
norm_cfg (dict, optional) – Config dict for normalization layer. Default: None.
train_cfg (dict, optional) – Training config of anchor head.
test_cfg (dict, optional) – Testing config of anchor head.
- forward(feats)[source]¶
Forward features from the upstream network.
- Parameters
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
- Usually contain classification scores, bbox predictions,
and direction class predictions. cls_scores (list[Tensor]): Box scores for each scale level,
each is a 4D-tensor, the channel number is num_points * num_classes.
- bbox_preds (list[Tensor]): Box energies / deltas for each scale
level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
- dir_cls_preds (list[Tensor]): Box scores for direction class
predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)
- attr_preds (list[Tensor]): Attribute scores for each scale
level, each is a 4D-tensor, the channel number is num_points * num_attrs.
- Return type
tuple
- forward_single(x)[source]¶
Forward features of a single scale level.
- Parameters
x (Tensor) – FPN feature maps of the specified stride.
- Returns
- Scores for each class, bbox predictions, direction class,
and attributes, features after classification and regression conv layers, some models needs these features like FCOS.
- Return type
tuple
- abstract get_bboxes(cls_scores, bbox_preds, dir_cls_preds, attr_preds, img_metas, cfg=None, rescale=None)[source]¶
Transform network output for a batch into bbox predictions.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_points * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_points * bbox_code_size, H, W)
dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)
attr_preds (list[Tensor]) – Attribute scores for each scale level Has shape (N, num_points * num_attrs, H, W)
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used
rescale (bool) – If True, return boxes in original image space
- get_points(featmap_sizes, dtype, device, flatten=False)[source]¶
Get points according to feature map sizes.
- Parameters
featmap_sizes (list[tuple]) – Multi-level feature map sizes.
dtype (torch.dtype) – Type of points.
device (torch.device) – Device of points.
- Returns
points of each image.
- Return type
tuple
- abstract get_targets(points, gt_bboxes_list, gt_labels_list, gt_bboxes_3d_list, gt_labels_3d_list, centers2d_list, depths_list, attr_labels_list)[source]¶
Compute regression, classification and centerss targets for points in multiple images.
- Parameters
points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
gt_bboxes_list (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).
gt_labels_list (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).
gt_bboxes_3d_list (list[Tensor]) – 3D Ground truth bboxes of each image, each has shape (num_gt, bbox_code_size).
gt_labels_3d_list (list[Tensor]) – 3D Ground truth labels of each box, each has shape (num_gt,).
centers2d_list (list[Tensor]) – Projected 3D centers onto 2D image, each has shape (num_gt, 2).
depths_list (list[Tensor]) – Depth of projected 3D centers onto 2D image, each has shape (num_gt, 1).
attr_labels_list (list[Tensor]) – Attribute labels of each box, each has shape (num_gt,).
- init_weights()[source]¶
Initialize weights of the head.
We currently still use the customized defined init_weights because the default init of DCN triggered by the init_cfg will init conv_offset.weight, which mistakenly affects the training stability.
- abstract loss(cls_scores, bbox_preds, dir_cls_preds, attr_preds, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels, img_metas, gt_bboxes_ignore=None)[source]¶
Compute loss of the head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)
attr_preds (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_attrs.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
gt_bboxes_3d (list[Tensor]) – 3D Ground truth bboxes for each image with shape (num_gts, bbox_code_size).
gt_labels_3d (list[Tensor]) – 3D class indices of each box.
centers2d (list[Tensor]) – Projected 3D centers onto 2D images.
depths (list[Tensor]) – Depth of projected centers on 2D images.
attr_labels (list[Tensor], optional) – Attribute indices corresponding to each box
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.
- class mmdet3d.models.dense_heads.BaseConvBboxHead(in_channels=0, shared_conv_channels=(), cls_conv_channels=(), num_cls_out_channels=0, reg_conv_channels=(), num_reg_out_channels=0, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, bias='auto', init_cfg=None, *args, **kwargs)[source]¶
More general bbox head, with shared conv layers and two optional separated branches.
/-> cls convs -> cls_score shared convs \-> reg convs -> bbox_pred
- class mmdet3d.models.dense_heads.BaseMono3DDenseHead(init_cfg=None)[source]¶
Base class for Monocular 3D DenseHeads.
- forward_train(x, img_metas, gt_bboxes, gt_labels=None, gt_bboxes_3d=None, gt_labels_3d=None, centers2d=None, depths=None, attr_labels=None, gt_bboxes_ignore=None, proposal_cfg=None, **kwargs)[source]¶
- Parameters
x (list[Tensor]) – Features from FPN.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes (list[Tensor]) – Ground truth bboxes of the image, shape (num_gts, 4).
gt_labels (list[Tensor]) – Ground truth labels of each box, shape (num_gts,).
gt_bboxes_3d (list[Tensor]) – 3D ground truth bboxes of the image, shape (num_gts, self.bbox_code_size).
gt_labels_3d (list[Tensor]) – 3D ground truth labels of each box, shape (num_gts,).
centers2d (list[Tensor]) – Projected 3D center of each box, shape (num_gts, 2).
depths (list[Tensor]) – Depth of projected 3D center of each box, shape (num_gts,).
attr_labels (list[Tensor]) – Attribute labels of each box, shape (num_gts,).
gt_bboxes_ignore (list[Tensor]) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4).
proposal_cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used
- Returns
losses: (dict[str, Tensor]): A dictionary of loss components. proposal_list (list[Tensor]): Proposals of each image.
- Return type
tuple
- class mmdet3d.models.dense_heads.CenterHead(in_channels=[128], tasks=None, train_cfg=None, test_cfg=None, bbox_coder=None, common_heads={}, loss_cls={'reduction': 'mean', 'type': 'GaussianFocalLoss'}, loss_bbox={'loss_weight': 0.25, 'reduction': 'none', 'type': 'L1Loss'}, separate_head={'final_kernel': 3, 'init_bias': - 2.19, 'type': 'SeparateHead'}, share_conv_channel=64, num_heatmap_convs=2, conv_cfg={'type': 'Conv2d'}, norm_cfg={'type': 'BN2d'}, bias='auto', norm_bbox=True, init_cfg=None)[source]¶
CenterHead for CenterPoint.
- Parameters
in_channels (list[int] | int, optional) – Channels of the input feature map. Default: [128].
tasks (list[dict], optional) – Task information including class number and class names. Default: None.
train_cfg (dict, optional) – Train-time configs. Default: None.
test_cfg (dict, optional) – Test-time configs. Default: None.
bbox_coder (dict, optional) – Bbox coder configs. Default: None.
common_heads (dict, optional) – Conv information for common heads. Default: dict().
loss_cls (dict, optional) – Config of classification loss function. Default: dict(type=’GaussianFocalLoss’, reduction=’mean’).
loss_bbox (dict, optional) – Config of regression loss function. Default: dict(type=’L1Loss’, reduction=’none’).
separate_head (dict, optional) – Config of separate head. Default: dict( type=’SeparateHead’, init_bias=-2.19, final_kernel=3)
share_conv_channel (int, optional) – Output channels for share_conv layer. Default: 64.
num_heatmap_convs (int, optional) – Number of conv layers for heatmap conv layer. Default: 2.
conv_cfg (dict, optional) – Config of conv layer. Default: dict(type=’Conv2d’)
norm_cfg (dict, optional) – Config of norm layer. Default: dict(type=’BN2d’).
bias (str, optional) – Type of bias. Default: ‘auto’.
- forward(feats)[source]¶
Forward pass.
- Parameters
feats (list[torch.Tensor]) – Multi-level features, e.g., features produced by FPN.
- Returns
Output results for tasks.
- Return type
tuple(list[dict])
- forward_single(x)[source]¶
Forward function for CenterPoint.
- Parameters
x (torch.Tensor) – Input feature map with the shape of [B, 512, 128, 128].
- Returns
Output results for tasks.
- Return type
list[dict]
- get_bboxes(preds_dicts, img_metas, img=None, rescale=False)[source]¶
Generate bboxes from bbox head predictions.
- Parameters
preds_dicts (tuple[list[dict]]) – Prediction results.
img_metas (list[dict]) – Point cloud and image’s meta info.
- Returns
Decoded bbox, scores and labels after nms.
- Return type
list[dict]
- get_targets(gt_bboxes_3d, gt_labels_3d)[source]¶
Generate targets.
How each output is transformed:
Each nested list is transposed so that all same-index elements in each sub-list (1, …, N) become the new sub-lists.
[ [a0, a1, a2, … ], [b0, b1, b2, … ], … ] ==> [ [a0, b0, … ], [a1, b1, … ], [a2, b2, … ] ]
The new transposed nested list is converted into a list of N tensors generated by concatenating tensors in the new sub-lists.
[ tensor0, tensor1, tensor2, … ]
- Parameters
gt_bboxes_3d (list[
LiDARInstance3DBoxes
]) – Ground truth gt boxes.gt_labels_3d (list[torch.Tensor]) – Labels of boxes.
- Returns
- tuple[list[torch.Tensor]]: Tuple of target including
the following results in order.
list[torch.Tensor]: Heatmap scores.
list[torch.Tensor]: Ground truth boxes.
- list[torch.Tensor]: Indexes indicating the
position of the valid boxes.
- list[torch.Tensor]: Masks indicating which
boxes are valid.
- Return type
Returns
- get_targets_single(gt_bboxes_3d, gt_labels_3d)[source]¶
Generate training targets for a single sample.
- Parameters
gt_bboxes_3d (
LiDARInstance3DBoxes
) – Ground truth gt boxes.gt_labels_3d (torch.Tensor) – Labels of boxes.
- Returns
- Tuple of target including
the following results in order.
list[torch.Tensor]: Heatmap scores.
list[torch.Tensor]: Ground truth boxes.
- list[torch.Tensor]: Indexes indicating the position
of the valid boxes.
- list[torch.Tensor]: Masks indicating which boxes
are valid.
- Return type
tuple[list[torch.Tensor]]
- get_task_detections(num_class_with_bg, batch_cls_preds, batch_reg_preds, batch_cls_labels, img_metas)[source]¶
Rotate nms for each task.
- Parameters
num_class_with_bg (int) – Number of classes for the current task.
batch_cls_preds (list[torch.Tensor]) – Prediction score with the shape of [N].
batch_reg_preds (list[torch.Tensor]) – Prediction bbox with the shape of [N, 9].
batch_cls_labels (list[torch.Tensor]) – Prediction label with the shape of [N].
img_metas (list[dict]) – Meta information of each sample.
- Returns
torch.Tensor]]: contains the following keys:
- -bboxes (torch.Tensor): Prediction bboxes after nms with the
shape of [N, 9].
- -scores (torch.Tensor): Prediction scores after nms with the
shape of [N].
- -labels (torch.Tensor): Prediction labels after nms with the
shape of [N].
- Return type
list[dict[str
- loss(gt_bboxes_3d, gt_labels_3d, preds_dicts, **kwargs)[source]¶
Loss function for CenterHead.
- Parameters
gt_bboxes_3d (list[
LiDARInstance3DBoxes
]) – Ground truth gt boxes.gt_labels_3d (list[torch.Tensor]) – Labels of boxes.
preds_dicts (dict) – Output of forward function.
- Returns
torch.Tensor]: Loss of heatmap and bbox of each task.
- Return type
dict[str
- class mmdet3d.models.dense_heads.FCAF3DHead(n_classes, in_channels, out_channels, n_reg_outs, voxel_size, pts_prune_threshold, pts_assign_threshold, pts_center_threshold, center_loss={'type': 'CrossEntropyLoss', 'use_sigmoid': True}, bbox_loss={'type': 'AxisAlignedIoULoss'}, cls_loss={'type': 'FocalLoss'}, train_cfg=None, test_cfg=None, init_cfg=None)[source]¶
Bbox head of FCAF3D. Actually here we store both the sparse 3D FPN and a head. The neck and the head can not be simply separated as pruning score on the i-th level of FPN requires classification scores from i+1-th level of the head.
- Parameters
n_classes (int) – Number of classes.
in_channels (tuple[int]) – Number of channels in input tensors.
out_channels (int) – Number of channels in the neck output tensors.
n_reg_outs (int) – Number of regression layer channels.
voxel_size (float) – Voxel size in meters.
pts_prune_threshold (int) – Pruning threshold on each feature level.
pts_assign_threshold (int) – Box to location assigner parameter. Assigner selects the maximum feature level with more locations inside the box than pts_assign_threshold.
pts_center_threshold (int) – Box to location assigner parameter. After feature level for the box is determined, assigner selects pts_center_threshold locations closest to the box center.
center_loss (dict, optional) – Config of centerness loss.
bbox_loss (dict, optional) – Config of bbox loss.
cls_loss (dict, optional) – Config of classification loss.
train_cfg (dict, optional) – Config for train stage. Defaults to None.
test_cfg (dict, optional) – Config for test stage. Defaults to None.
init_cfg (dict, optional) – Config for weight initialization. Defaults to None.
- forward(x)[source]¶
Forward pass.
- Parameters
x (list[Tensor]) – Features from the backbone.
- Returns
Predictions of the head.
- Return type
list[list[Tensor]]
- forward_test(x, input_metas)[source]¶
Forward pass of the test stage.
- Parameters
x (list[SparseTensor]) – Features from the backbone.
input_metas (list[dict]) – Contains scene meta info for each sample.
- Returns
bboxes, scores and labels for each sample.
- Return type
list[list[Tensor]]
- forward_train(x, gt_bboxes, gt_labels, input_metas)[source]¶
Forward pass of the train stage.
- Parameters
x (list[SparseTensor]) – Features from the backbone.
gt_bboxes (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each sample.gt_labels (list[torch.Tensor]) – Labels of each sample.
input_metas (list[dict]) – Contains scene meta info for each sample.
- Returns
Centerness, bbox and classification loss values.
- Return type
dict
- class mmdet3d.models.dense_heads.FCOSMono3DHead(regress_ranges=((- 1, 48), (48, 96), (96, 192), (192, 384), (384, 100000000.0)), center_sampling=True, center_sample_radius=1.5, norm_on_bbox=True, centerness_on_reg=True, centerness_alpha=2.5, loss_cls={'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'FocalLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, loss_attr={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': False}, loss_centerness={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, bbox_coder={'code_size': 9, 'type': 'FCOS3DBBoxCoder'}, norm_cfg={'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, centerness_branch=(64), init_cfg=None, **kwargs)[source]¶
Anchor-free head used in FCOS3D.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
regress_ranges (tuple[tuple[int, int]], optional) – Regress range of multiple level points.
center_sampling (bool, optional) – If true, use center sampling. Default: True.
center_sample_radius (float, optional) – Radius of center sampling. Default: 1.5.
norm_on_bbox (bool, optional) – If true, normalize the regression targets with FPN strides. Default: True.
centerness_on_reg (bool, optional) – If true, position centerness on the regress branch. Please refer to https://github.com/tianzhi0549/FCOS/issues/89#issuecomment-516877042. Default: True.
centerness_alpha (int, optional) – Parameter used to adjust the intensity attenuation from the center to the periphery. Default: 2.5.
loss_cls (dict, optional) – Config of classification loss.
loss_bbox (dict, optional) – Config of localization loss.
loss_dir (dict, optional) – Config of direction classification loss.
loss_attr (dict, optional) – Config of attribute classification loss.
loss_centerness (dict, optional) – Config of centerness loss.
norm_cfg (dict, optional) – dictionary to construct and config norm layer. Default: norm_cfg=dict(type=’GN’, num_groups=32, requires_grad=True).
centerness_branch (tuple[int], optional) – Channels for centerness branch. Default: (64, ).
- static add_sin_difference(boxes1, boxes2)[source]¶
Convert the rotation difference to difference in sine function.
- Parameters
boxes1 (torch.Tensor) – Original Boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.
boxes2 (torch.Tensor) – Target boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.
- Returns
boxes1
andboxes2
whose 7thdimensions are changed.
- Return type
tuple[torch.Tensor]
- forward(feats)[source]¶
Forward features from the upstream network.
- Parameters
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
- cls_scores (list[Tensor]): Box scores for each scale level,
each is a 4D-tensor, the channel number is num_points * num_classes.
- bbox_preds (list[Tensor]): Box energies / deltas for each scale
level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
- dir_cls_preds (list[Tensor]): Box scores for direction class
predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2).
- attr_preds (list[Tensor]): Attribute scores for each scale
level, each is a 4D-tensor, the channel number is num_points * num_attrs.
- centernesses (list[Tensor]): Centerness for each scale level,
each is a 4D-tensor, the channel number is num_points * 1.
- Return type
tuple
- forward_single(x, scale, stride)[source]¶
Forward features of a single scale level.
- Parameters
x (Tensor) – FPN feature maps of the specified stride.
( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.
stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.
- Returns
- scores for each class, bbox and direction class
predictions, centerness predictions of input feature maps.
- Return type
tuple
- get_bboxes(cls_scores, bbox_preds, dir_cls_preds, attr_preds, centernesses, img_metas, cfg=None, rescale=None)[source]¶
Transform network output for a batch into bbox predictions.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_points * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_points * 4, H, W)
dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)
attr_preds (list[Tensor]) – Attribute scores for each scale level Has shape (N, num_points * num_attrs, H, W)
centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_points * 1, H, W)
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used
rescale (bool) – If True, return boxes in original image space
- Returns
- Each item in result_list is 2-tuple.
The first item is an (n, 5) tensor, where the first 4 columns are bounding box positions (tl_x, tl_y, br_x, br_y) and the 5-th column is a score between 0 and 1. The second item is a (n,) tensor where each item is the predicted class label of the corresponding box.
- Return type
list[tuple[Tensor, Tensor]]
- static get_direction_target(reg_targets, dir_offset=0, dir_limit_offset=0.0, num_bins=2, one_hot=True)[source]¶
Encode direction to 0 ~ num_bins-1.
- Parameters
reg_targets (torch.Tensor) – Bbox regression targets.
dir_offset (int, optional) – Direction offset. Default to 0.
dir_limit_offset (float, optional) – Offset to set the direction range. Default to 0.0.
num_bins (int, optional) – Number of bins to divide 2*PI. Default to 2.
one_hot (bool, optional) – Whether to encode as one hot. Default to True.
- Returns
Encoded direction targets.
- Return type
torch.Tensor
- get_targets(points, gt_bboxes_list, gt_labels_list, gt_bboxes_3d_list, gt_labels_3d_list, centers2d_list, depths_list, attr_labels_list)[source]¶
Compute regression, classification and centerss targets for points in multiple images.
- Parameters
points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
gt_bboxes_list (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).
gt_labels_list (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).
gt_bboxes_3d_list (list[Tensor]) – 3D Ground truth bboxes of each image, each has shape (num_gt, bbox_code_size).
gt_labels_3d_list (list[Tensor]) – 3D Ground truth labels of each box, each has shape (num_gt,).
centers2d_list (list[Tensor]) – Projected 3D centers onto 2D image, each has shape (num_gt, 2).
depths_list (list[Tensor]) – Depth of projected 3D centers onto 2D image, each has shape (num_gt, 1).
attr_labels_list (list[Tensor]) – Attribute labels of each box, each has shape (num_gt,).
- Returns
concat_lvl_labels (list[Tensor]): Labels of each level. concat_lvl_bbox_targets (list[Tensor]): BBox targets of each
level.
- Return type
tuple
- init_weights()[source]¶
Initialize weights of the head.
We currently still use the customized init_weights because the default init of DCN triggered by the init_cfg will init conv_offset.weight, which mistakenly affects the training stability.
- loss(cls_scores, bbox_preds, dir_cls_preds, attr_preds, centernesses, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels, img_metas, gt_bboxes_ignore=None)[source]¶
Compute loss of the head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)
attr_preds (list[Tensor]) – Attribute scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_attrs.
centernesses (list[Tensor]) – Centerness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
gt_bboxes_3d (list[Tensor]) – 3D boxes ground truth with shape of (num_gts, code_size).
gt_labels_3d (list[Tensor]) – same as gt_labels
centers2d (list[Tensor]) – 2D centers on the image with shape of (num_gts, 2).
depths (list[Tensor]) – Depth ground truth with shape of (num_gts, ).
attr_labels (list[Tensor]) – Attributes indices of each box.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (list[Tensor]) – specify which bounding boxes can be ignored when computing the loss.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- class mmdet3d.models.dense_heads.FreeAnchor3DHead(pre_anchor_topk=50, bbox_thr=0.6, gamma=2.0, alpha=0.5, init_cfg=None, **kwargs)[source]¶
FreeAnchor head for 3D detection.
Note
This implementation is directly modified from the mmdet implementation. We find it also works on 3D detection with minor modification, i.e., different hyper-parameters and a additional direction classifier.
- Parameters
pre_anchor_topk (int) – Number of boxes that be token in each bag.
bbox_thr (float) – The threshold of the saturated linear function. It is usually the same with the IoU threshold used in NMS.
gamma (float) – Gamma parameter in focal loss.
alpha (float) – Alpha parameter in focal loss.
kwargs (dict) – Other arguments are the same as those in
Anchor3DHead
.
- loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]¶
Calculate loss of FreeAnchor head.
- Parameters
cls_scores (list[torch.Tensor]) – Classification scores of different samples.
bbox_preds (list[torch.Tensor]) – Box predictions of different samples
dir_cls_preds (list[torch.Tensor]) – Direction predictions of different samples
gt_bboxes (list[
BaseInstance3DBoxes
]) – Ground truth boxes.gt_labels (list[torch.Tensor]) – Ground truth labels.
input_metas (list[dict]) – List of input meta information.
gt_bboxes_ignore (list[
BaseInstance3DBoxes
], optional) – Ground truth boxes that should be ignored. Defaults to None.
- Returns
Loss items.
positive_bag_loss (torch.Tensor): Loss of positive samples.
negative_bag_loss (torch.Tensor): Loss of negative samples.
- Return type
dict[str, torch.Tensor]
- negative_bag_loss(cls_prob, box_prob)[source]¶
Generate negative bag loss.
- Parameters
cls_prob (torch.Tensor) – Classification probability of negative samples.
box_prob (torch.Tensor) – Bounding box probability of negative samples.
- Returns
Loss of negative samples.
- Return type
torch.Tensor
- positive_bag_loss(matched_cls_prob, matched_box_prob)[source]¶
Generate positive bag loss.
- Parameters
matched_cls_prob (torch.Tensor) – Classification probability of matched positive samples.
matched_box_prob (torch.Tensor) – Bounding box probability of matched positive samples.
- Returns
Loss of positive samples.
- Return type
torch.Tensor
- class mmdet3d.models.dense_heads.GroupFree3DHead(num_classes, in_channels, bbox_coder, num_decoder_layers, transformerlayers, decoder_self_posembeds={'input_channel': 6, 'num_pos_feats': 288, 'type': 'ConvBNPositionalEncoding'}, decoder_cross_posembeds={'input_channel': 3, 'num_pos_feats': 288, 'type': 'ConvBNPositionalEncoding'}, train_cfg=None, test_cfg=None, num_proposal=128, pred_layer_cfg=None, size_cls_agnostic=True, gt_per_seed=3, sampling_objectness_loss=None, objectness_loss=None, center_loss=None, dir_class_loss=None, dir_res_loss=None, size_class_loss=None, size_res_loss=None, size_reg_loss=None, semantic_loss=None, init_cfg=None)[source]¶
Bbox head of Group-Free 3D.
- Parameters
num_classes (int) – The number of class.
in_channels (int) – The dims of input features from backbone.
bbox_coder (
BaseBBoxCoder
) – Bbox coder for encoding and decoding boxes.num_decoder_layers (int) – The number of transformer decoder layers.
transformerlayers (dict) – Config for transformer decoder.
train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
num_proposal (int) – The number of initial sampling candidates.
pred_layer_cfg (dict) – Config of classfication and regression prediction layers.
size_cls_agnostic (bool) – Whether the predicted size is class-agnostic.
gt_per_seed (int) – the number of candidate instance each point belongs to.
sampling_objectness_loss (dict) – Config of initial sampling objectness loss.
objectness_loss (dict) – Config of objectness loss.
center_loss (dict) – Config of center loss.
dir_class_loss (dict) – Config of direction classification loss.
dir_res_loss (dict) – Config of direction residual regression loss.
size_class_loss (dict) – Config of size classification loss.
size_res_loss (dict) – Config of size residual regression loss.
size_reg_loss (dict) – Config of class-agnostic size regression loss.
semantic_loss (dict) – Config of point-wise semantic segmentation loss.
- forward(feat_dict, sample_mod)[source]¶
Forward pass.
Note
The forward of GroupFree3DHead is divided into 2 steps:
Initial object candidates sampling.
Iterative object box prediction by transformer decoder.
- Parameters
feat_dict (dict) – Feature dict from backbone.
sample_mod (str) – sample mode for initial candidates sampling.
- Returns
Predictions of GroupFree3D head.
- Return type
results (dict)
- get_bboxes(points, bbox_preds, input_metas, rescale=False, use_nms=True)[source]¶
Generate bboxes from GroupFree3D head predictions.
- Parameters
points (torch.Tensor) – Input points.
bbox_preds (dict) – Predictions from GroupFree3D head.
input_metas (list[dict]) – Point cloud and image’s meta info.
rescale (bool) – Whether to rescale bboxes.
use_nms (bool) – Whether to apply NMS, skip nms postprocessing while using GroupFree3D head in rpn stage.
- Returns
Bounding boxes, scores and labels.
- Return type
list[tuple[torch.Tensor]]
- get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None, max_gt_num=64)[source]¶
Generate targets of GroupFree3D head.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – Labels of each batch.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance label of each batch.
bbox_preds (torch.Tensor) – Bounding box predictions of vote head.
max_gt_num (int) – Max number of GTs for single batch.
- Returns
Targets of GroupFree3D head.
- Return type
tuple[torch.Tensor]
- get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, max_gt_nums=None, seed_points=None, seed_indices=None, candidate_indices=None, seed_points_obj_topk=4)[source]¶
Generate targets of GroupFree3D head for single batch.
- Parameters
points (torch.Tensor) – Points of each batch.
gt_bboxes_3d (
BaseInstance3DBoxes
) – Ground truth boxes of each batch.gt_labels_3d (torch.Tensor) – Labels of each batch.
pts_semantic_mask (torch.Tensor) – Point-wise semantic label of each batch.
pts_instance_mask (torch.Tensor) – Point-wise instance label of each batch.
max_gt_nums (int) – Max number of GTs for single batch.
seed_points (torch.Tensor) – Coordinates of seed points.
seed_indices (torch.Tensor) – Indices of seed points.
candidate_indices (torch.Tensor) – Indices of object candidates.
seed_points_obj_topk (int) – k value of k-Closest Points Sampling.
- Returns
Targets of GroupFree3D head.
- Return type
tuple[torch.Tensor]
- loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, gt_bboxes_ignore=None, ret_target=False)[source]¶
Compute loss.
- Parameters
bbox_preds (dict) – Predictions from forward of vote head.
points (list[torch.Tensor]) – Input points.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each sample.gt_labels_3d (list[torch.Tensor]) – Labels of each sample.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic mask.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance mask.
img_metas (list[dict]) – Contain pcd and img’s meta info.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
ret_target (Bool) – Return targets or not.
- Returns
Losses of GroupFree3D.
- Return type
dict
- multiclass_nms_single(obj_scores, sem_scores, bbox, points, input_meta)[source]¶
Multi-class nms in single batch.
- Parameters
obj_scores (torch.Tensor) – Objectness score of bounding boxes.
sem_scores (torch.Tensor) – semantic class score of bounding boxes.
bbox (torch.Tensor) – Predicted bounding boxes.
points (torch.Tensor) – Input points.
input_meta (dict) – Point cloud and image’s meta info.
- Returns
Bounding boxes, scores and labels.
- Return type
tuple[torch.Tensor]
- class mmdet3d.models.dense_heads.ImVoxelHead(n_classes, n_levels, n_channels, n_reg_outs, pts_assign_threshold, pts_center_threshold, prior_generator, center_loss={'type': 'CrossEntropyLoss', 'use_sigmoid': True}, bbox_loss={'type': 'RotatedIoU3DLoss'}, cls_loss={'type': 'FocalLoss'}, train_cfg=None, test_cfg=None, init_cfg=None)[source]¶
`ImVoxelNet<https://arxiv.org/abs/2106.01178>`_ head for indoor datasets.
- Parameters
n_classes (int) – Number of classes.
n_levels (int) – Number of feature levels.
n_channels (int) – Number of channels in input tensors.
n_reg_outs (int) – Number of regression layer channels.
pts_assign_threshold (int) – Min number of location per box to be assigned with.
pts_center_threshold (int) – Max number of locations per box to be assigned with.
center_loss (dict, optional) – Config of centerness loss. Default: dict(type=’CrossEntropyLoss’, use_sigmoid=True).
bbox_loss (dict, optional) – Config of bbox loss. Default: dict(type=’RotatedIoU3DLoss’).
cls_loss (dict, optional) – Config of classification loss. Default: dict(type=’FocalLoss’).
train_cfg (dict, optional) – Config for train stage. Defaults to None.
test_cfg (dict, optional) – Config for test stage. Defaults to None.
init_cfg (dict, optional) – Config for weight initialization. Defaults to None.
- forward(x)[source]¶
Forward function.
- Parameters
x (list[Tensor]) – Features from 3d neck.
- Returns
Centerness, bbox and classification predictions.
- Return type
tuple[Tensor]
- get_bboxes(center_preds, bbox_preds, cls_preds, valid_pred, img_metas)[source]¶
Generate boxes for all scenes.
- Parameters
center_preds (list[list[Tensor]]) – Centerness predictions for all scenes.
bbox_preds (list[list[Tensor]]) – Bbox predictions for all scenes.
cls_preds (list[list[Tensor]]) – Classification predictions for all scenes.
valid_pred (Tensor) – Valid mask prediction for all scenes.
img_metas (list[dict]) – Meta infos for all scenes.
- Returns
- Predicted bboxes, scores, and labels for
all scenes.
- Return type
list[tuple[Tensor]]
- loss(center_preds, bbox_preds, cls_preds, valid_pred, gt_bboxes, gt_labels, img_metas)[source]¶
Per scene loss function.
- Parameters
center_preds (list[list[Tensor]]) – Centerness predictions for all scenes.
bbox_preds (list[list[Tensor]]) – Bbox predictions for all scenes.
cls_preds (list[list[Tensor]]) – Classification predictions for all scenes.
valid_pred (Tensor) – Valid mask prediction for all scenes.
gt_bboxes (list[BaseInstance3DBoxes]) – Ground truth boxes for all scenes.
gt_labels (list[Tensor]) – Ground truth labels for all scenes.
img_metas (list[dict]) – Meta infos for all scenes.
- Returns
Centerness, bbox, and classification loss values.
- Return type
dict
- class mmdet3d.models.dense_heads.MonoFlexHead(num_classes, in_channels, use_edge_fusion, edge_fusion_inds, edge_heatmap_ratio, filter_outside_objs=True, loss_cls={'loss_weight': 1.0, 'type': 'GaussianFocalLoss'}, loss_bbox={'loss_weight': 0.1, 'type': 'IoULoss'}, loss_dir={'loss_weight': 0.1, 'type': 'MultiBinLoss'}, loss_keypoints={'loss_weight': 0.1, 'type': 'L1Loss'}, loss_dims={'loss_weight': 0.1, 'type': 'L1Loss'}, loss_offsets2d={'loss_weight': 0.1, 'type': 'L1Loss'}, loss_direct_depth={'loss_weight': 0.1, 'type': 'L1Loss'}, loss_keypoints_depth={'loss_weight': 0.1, 'type': 'L1Loss'}, loss_combined_depth={'loss_weight': 0.1, 'type': 'L1Loss'}, loss_attr=None, bbox_coder={'code_size': 7, 'type': 'MonoFlexCoder'}, norm_cfg={'type': 'BN'}, init_cfg=None, init_bias=- 2.19, **kwargs)[source]¶
MonoFlex head used in MonoFlex
/ --> 3 x 3 conv --> 1 x 1 conv --> [edge fusion] --> cls | | --> 3 x 3 conv --> 1 x 1 conv --> 2d bbox | | --> 3 x 3 conv --> 1 x 1 conv --> [edge fusion] --> 2d offsets | | --> 3 x 3 conv --> 1 x 1 conv --> keypoints offsets | | --> 3 x 3 conv --> 1 x 1 conv --> keypoints uncertainty feature | --> 3 x 3 conv --> 1 x 1 conv --> keypoints uncertainty | | --> 3 x 3 conv --> 1 x 1 conv --> 3d dimensions | | |--- 1 x 1 conv --> ori cls | --> 3 x 3 conv --| | |--- 1 x 1 conv --> ori offsets | | --> 3 x 3 conv --> 1 x 1 conv --> depth | \ --> 3 x 3 conv --> 1 x 1 conv --> depth uncertainty
- Parameters
use_edge_fusion (bool) – Whether to use edge fusion module while feature extraction.
edge_fusion_inds (list[tuple]) – Indices of feature to use edge fusion.
edge_heatmap_ratio (float) – Ratio of generating target heatmap.
filter_outside_objs (bool, optional) – Whether to filter the outside objects. Default: True.
loss_cls (dict, optional) – Config of classification loss. Default: loss_cls=dict(type=’GaussionFocalLoss’, loss_weight=1.0).
loss_bbox (dict, optional) – Config of localization loss. Default: loss_bbox=dict(type=’IOULoss’, loss_weight=10.0).
loss_dir (dict, optional) – Config of direction classification loss. Default: dict(type=’MultibinLoss’, loss_weight=0.1).
loss_keypoints (dict, optional) – Config of keypoints loss. Default: dict(type=’L1Loss’, loss_weight=0.1).
loss_dims – (dict, optional): Config of dimensions loss. Default: dict(type=’L1Loss’, loss_weight=0.1).
loss_offsets2d – (dict, optional): Config of offsets2d loss. Default: dict(type=’L1Loss’, loss_weight=0.1).
loss_direct_depth – (dict, optional): Config of directly regression depth loss. Default: dict(type=’L1Loss’, loss_weight=0.1).
loss_keypoints_depth – (dict, optional): Config of keypoints decoded depth loss. Default: dict(type=’L1Loss’, loss_weight=0.1).
loss_combined_depth – (dict, optional): Config of combined depth loss. Default: dict(type=’L1Loss’, loss_weight=0.1).
loss_attr (dict, optional) – Config of attribute classification loss. In MonoFlex, Default: None.
bbox_coder (dict, optional) – Bbox coder for encoding and decoding boxes. Default: dict(type=’MonoFlexCoder’, code_size=7).
norm_cfg (dict, optional) – Dictionary to construct and config norm layer. Default: norm_cfg=dict(type=’GN’, num_groups=32, requires_grad=True).
init_cfg (dict) – Initialization config dict. Default: None.
- decode_heatmap(cls_score, reg_pred, input_metas, cam2imgs, topk=100, kernel=3)[source]¶
Transform outputs into detections raw bbox predictions.
- Parameters
class_score (Tensor) – Center predict heatmap, shape (B, num_classes, H, W).
reg_pred (Tensor) – Box regression map. shape (B, channel, H , W).
input_metas (List[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
cam2imgs (Tensor) – Camera intrinsic matrix. shape (N, 4, 4)
topk (int, optional) – Get top k center keypoints from heatmap. Default 100.
kernel (int, optional) – Max pooling kernel for extract local maximum pixels. Default 3.
- Returns
- Decoded output of SMOKEHead, containing
the following Tensors:
- batch_bboxes (Tensor): Coords of each 3D box.
shape (B, k, 7)
- batch_scores (Tensor): Scores of each 3D box.
shape (B, k)
- batch_topk_labels (Tensor): Categories of each 3D box.
shape (B, k)
- Return type
tuple[torch.Tensor]
- forward(feats, input_metas)[source]¶
Forward features from the upstream network.
- Parameters
feats (list[Tensor]) – Features from the upstream network, each is a 4D-tensor.
input_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
- Returns
- cls_scores (list[Tensor]): Box scores for each scale level,
each is a 4D-tensor, the channel number is num_points * num_classes.
- bbox_preds (list[Tensor]): Box energies / deltas for each scale
level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
- Return type
tuple
- forward_single(x, input_metas)[source]¶
Forward features of a single scale level.
- Parameters
x (Tensor) – Feature maps from a specific FPN feature level.
input_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
- Returns
Scores for each class, bbox predictions.
- Return type
tuple
- forward_train(x, input_metas, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels, gt_bboxes_ignore, proposal_cfg, **kwargs)[source]¶
- Parameters
x (list[Tensor]) – Features from FPN.
input_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes (list[Tensor]) – Ground truth bboxes of the image, shape (num_gts, 4).
gt_labels (list[Tensor]) – Ground truth labels of each box, shape (num_gts,).
gt_bboxes_3d (list[Tensor]) – 3D ground truth bboxes of the image, shape (num_gts, self.bbox_code_size).
gt_labels_3d (list[Tensor]) – 3D ground truth labels of each box, shape (num_gts,).
centers2d (list[Tensor]) – Projected 3D center of each box, shape (num_gts, 2).
depths (list[Tensor]) – Depth of projected 3D center of each box, shape (num_gts,).
attr_labels (list[Tensor]) – Attribute labels of each box, shape (num_gts,).
gt_bboxes_ignore (list[Tensor]) – Ground truth bboxes to be ignored, shape (num_ignored_gts, 4).
proposal_cfg (mmcv.Config) – Test / postprocessing configuration, if None, test_cfg would be used
- Returns
losses: (dict[str, Tensor]): A dictionary of loss components. proposal_list (list[Tensor]): Proposals of each image.
- Return type
tuple
- get_bboxes(cls_scores, bbox_preds, input_metas)[source]¶
Generate bboxes from bbox head predictions.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level.
bbox_preds (list[Tensor]) – Box regression for each scale.
input_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
rescale (bool) – If True, return boxes in original image space.
- Returns
Each item in result_list is 4-tuple.
- Return type
list[tuple[
CameraInstance3DBoxes
, Tensor, Tensor, None]]
- get_predictions(pred_reg, labels3d, centers2d, reg_mask, batch_indices, input_metas, downsample_ratio)[source]¶
Prepare predictions for computing loss.
- Parameters
pred_reg (Tensor) – Box regression map. shape (B, channel, H , W).
labels3d (Tensor) – Labels of each 3D box. shape (B * max_objs, )
centers2d (Tensor) – Coords of each projected 3D box center on image. shape (N, 2)
reg_mask (Tensor) – Indexes of the existence of the 3D box. shape (B * max_objs, )
batch_indices (Tenosr) – Batch indices of the 3D box. shape (N, 3)
input_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
downsample_ratio (int) – The stride of feature map.
- Returns
The predictions for computing loss.
- Return type
dict
- get_targets(gt_bboxes_list, gt_labels_list, gt_bboxes_3d_list, gt_labels_3d_list, centers2d_list, depths_list, feat_shape, img_shape, input_metas)[source]¶
Get training targets for batch images. ``
- Args:
- gt_bboxes_list (list[Tensor]): Ground truth bboxes of each
image, shape (num_gt, 4).
- gt_labels_list (list[Tensor]): Ground truth labels of each
box, shape (num_gt,).
- gt_bboxes_3d_list (list[
CameraInstance3DBoxes
]): 3D Ground truth bboxes of each image, shape (num_gt, bbox_code_size).
- gt_labels_3d_list (list[Tensor]): 3D Ground truth labels of
each box, shape (num_gt,).
- centers2d_list (list[Tensor]): Projected 3D centers onto 2D
image, shape (num_gt, 2).
- depths_list (list[Tensor]): Depth of projected 3D centers onto 2D
image, each has shape (num_gt, 1).
- feat_shape (tuple[int]): Feature map shape with value,
shape (B, _, H, W).
img_shape (tuple[int]): Image shape in [h, w] format. input_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
- Returns:
- tuple[Tensor, dict]: The Tensor value is the targets of
center heatmap, the dict has components below:
- base_centers2d_target (Tensor): Coords of each projected 3D box
center on image. shape (B * max_objs, 2), [dtype: int]
- labels3d (Tensor): Labels of each 3D box.
shape (N, )
- reg_mask (Tensor): Mask of the existence of the 3D box.
shape (B * max_objs, )
- batch_indices (Tensor): Batch id of the 3D box.
shape (N, )
- depth_target (Tensor): Depth target of each 3D box.
shape (N, )
- keypoints2d_target (Tensor): Keypoints of each projected 3D box
on image. shape (N, 10, 2)
- keypoints_mask (Tensor): Keypoints mask of each projected 3D
box on image. shape (N, 10)
- keypoints_depth_mask (Tensor): Depths decoded from keypoints
of each 3D box. shape (N, 3)
- orientations_target (Tensor): Orientation (encoded local yaw)
target of each 3D box. shape (N, )
- offsets2d_target (Tensor): Offsets target of each projected
3D box. shape (N, 2)
- dimensions_target (Tensor): Dimensions target of each 3D box.
shape (N, 3)
downsample_ratio (int): The stride of feature map.
- loss(cls_scores, bbox_preds, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels, input_metas, gt_bboxes_ignore=None)[source]¶
Compute loss of the head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level. shape (num_gt, 4).
bbox_preds (list[Tensor]) – Box dims is a 4D-tensor, the channel number is bbox_code_size. shape (B, 7, H, W).
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image. shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – Class indices corresponding to each box. shape (num_gts, ).
gt_bboxes_3d (list[
CameraInstance3DBoxes
]) – 3D boxes ground truth. it is the flipped gt_bboxesgt_labels_3d (list[Tensor]) – Same as gt_labels.
centers2d (list[Tensor]) – 2D centers on the image. shape (num_gts, 2).
depths (list[Tensor]) – Depth ground truth. shape (num_gts, ).
attr_labels (list[Tensor]) – Attributes indices of each box. In kitti it’s None.
input_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]) – Specify which bounding boxes can be ignored when computing the loss. Default: None.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- class mmdet3d.models.dense_heads.PGDHead(use_depth_classifier=True, use_onlyreg_proj=False, weight_dim=- 1, weight_branch=((256)), depth_branch=(64), depth_range=(0, 70), depth_unit=10, division='uniform', depth_bins=8, loss_depth={'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, loss_bbox2d={'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, loss_consistency={'loss_weight': 1.0, 'type': 'GIoULoss'}, pred_bbox2d=True, pred_keypoints=False, bbox_coder={'base_depths': ((28.01, 16.32)), 'base_dims': ((0.8, 1.73, 0.6), (1.76, 1.73, 0.6), (3.9, 1.56, 1.6)), 'code_size': 7, 'type': 'PGDBBoxCoder'}, **kwargs)[source]¶
Anchor-free head used in PGD.
- Parameters
use_depth_classifer (bool, optional) – Whether to use depth classifier. Defaults to True.
use_only_reg_proj (bool, optional) – Whether to use only direct regressed depth in the re-projection (to make the network easier to learn). Defaults to False.
weight_dim (int, optional) – Dimension of the location-aware weight map. Defaults to -1.
weight_branch (tuple[tuple[int]], optional) – Feature map channels of the convolutional branch for weight map. Defaults to ((256, ), ).
depth_branch (tuple[int], optional) – Feature map channels of the branch for probabilistic depth estimation. Defaults to (64, ),
depth_range (tuple[float], optional) – Range of depth estimation. Defaults to (0, 70),
depth_unit (int, optional) – Unit of depth range division. Defaults to 10.
division (str, optional) – Depth division method. Options include ‘uniform’, ‘linear’, ‘log’, ‘loguniform’. Defaults to ‘uniform’.
depth_bins (int, optional) – Discrete bins of depth division. Defaults to 8.
loss_depth (dict, optional) – Depth loss. Defaults to dict( type=’SmoothL1Loss’, beta=1.0 / 9.0, loss_weight=1.0).
loss_bbox2d (dict, optional) – Loss for 2D box estimation. Defaults to dict(type=’SmoothL1Loss’, beta=1.0 / 9.0, loss_weight=1.0).
loss_consistency (dict, optional) – Consistency loss. Defaults to dict(type=’GIoULoss’, loss_weight=1.0),
pred_velo (bool, optional) – Whether to predict velocity. Defaults to False.
pred_bbox2d (bool, optional) – Whether to predict 2D bounding boxes. Defaults to True.
pred_keypoints (bool, optional) – Whether to predict keypoints. Defaults to False,
bbox_coder (dict, optional) – Bounding box coder. Defaults to dict(type=’PGDBBoxCoder’, base_depths=((28.01, 16.32), ), base_dims=((0.8, 1.73, 0.6), (1.76, 1.73, 0.6), (3.9, 1.56, 1.6)), code_size=7).
- forward(feats)[source]¶
Forward features from the upstream network.
- Parameters
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
- cls_scores (list[Tensor]): Box scores for each scale level,
each is a 4D-tensor, the channel number is num_points * num_classes.
- bbox_preds (list[Tensor]): Box energies / deltas for each scale
level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
- dir_cls_preds (list[Tensor]): Box scores for direction class
predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2).
- weight (list[Tensor]): Location-aware weight maps on each
scale level, each is a 4D-tensor, the channel number is num_points * 1.
- depth_cls_preds (list[Tensor]): Box scores for depth class
predictions on each scale level, each is a 4D-tensor, the channel number is num_points * self.num_depth_cls.
- attr_preds (list[Tensor]): Attribute scores for each scale
level, each is a 4D-tensor, the channel number is num_points * num_attrs.
- centernesses (list[Tensor]): Centerness for each scale level,
each is a 4D-tensor, the channel number is num_points * 1.
- Return type
tuple
- forward_single(x, scale, stride)[source]¶
Forward features of a single scale level.
- Parameters
x (Tensor) – FPN feature maps of the specified stride.
( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.
stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.
- Returns
- scores for each class, bbox and direction class
predictions, depth class predictions, location-aware weights, attribute and centerness predictions of input feature maps.
- Return type
tuple
- get_bboxes(cls_scores, bbox_preds, dir_cls_preds, depth_cls_preds, weights, attr_preds, centernesses, img_metas, cfg=None, rescale=None)[source]¶
Transform network output for a batch into bbox predictions.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_points * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_points * 4, H, W)
dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)
depth_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * self.num_depth_cls.
weights (list[Tensor]) – Location-aware weights for each scale level, each is a 4D-tensor, the channel number is num_points * self.weight_dim.
attr_preds (list[Tensor]) – Attribute scores for each scale level Has shape (N, num_points * num_attrs, H, W)
centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_points * 1, H, W)
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
cfg (mmcv.Config, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool, optional) – If True, return boxes in original image space. Defaults to None.
- Returns
- Each item in result_list is a tuple, which
consists of predicted 3D boxes, scores, labels, attributes and 2D boxes (if necessary).
- Return type
list[tuple[Tensor]]
- get_pos_predictions(bbox_preds, dir_cls_preds, depth_cls_preds, weights, attr_preds, centernesses, pos_inds, img_metas)[source]¶
Flatten predictions and get positive ones.
- Parameters
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)
depth_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * self.num_depth_cls.
attr_preds (list[Tensor]) – Attribute scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_attrs.
centernesses (list[Tensor]) – Centerness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.
pos_inds (Tensor) – Index of foreground points from flattened tensors.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
- Returns
- Box predictions, direction classes, probabilistic
depth maps, location-aware weight maps, attributes and centerness predictions.
- Return type
tuple[Tensor]
- get_proj_bbox2d(bbox_preds, pos_dir_cls_preds, labels_3d, bbox_targets_3d, pos_points, pos_inds, img_metas, pos_depth_cls_preds=None, pos_weights=None, pos_cls_scores=None, with_kpts=False)[source]¶
Decode box predictions and get projected 2D attributes.
- Parameters
bbox_preds (list[Tensor]) – Box predictions for each scale level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
pos_dir_cls_preds (Tensor) – Box scores for direction class predictions of positive boxes on all the scale levels in shape (num_pos_points, 2).
labels_3d (list[Tensor]) – 3D box category labels for each scale level, each is a 4D-tensor.
bbox_targets_3d (list[Tensor]) – 3D box targets for each scale level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
pos_points (Tensor) – Foreground points.
pos_inds (Tensor) – Index of foreground points from flattened tensors.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
pos_depth_cls_preds (Tensor, optional) – Probabilistic depth map of positive boxes on all the scale levels in shape (num_pos_points, self.num_depth_cls). Defaults to None.
pos_weights (Tensor, optional) – Location-aware weights of positive boxes in shape (num_pos_points, self.weight_dim). Defaults to None.
pos_cls_scores (Tensor, optional) – Classification scores of positive boxes in shape (num_pos_points, self.num_classes). Defaults to None.
with_kpts (bool, optional) – Whether to output keypoints targets. Defaults to False.
- Returns
- Exterior 2D boxes from projected 3D boxes,
predicted 2D boxes and keypoint targets (if necessary).
- Return type
tuple[Tensor]
- get_targets(points, gt_bboxes_list, gt_labels_list, gt_bboxes_3d_list, gt_labels_3d_list, centers2d_list, depths_list, attr_labels_list)[source]¶
Compute regression, classification and centerss targets for points in multiple images.
- Parameters
points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
gt_bboxes_list (list[Tensor]) – Ground truth bboxes of each image, each has shape (num_gt, 4).
gt_labels_list (list[Tensor]) – Ground truth labels of each box, each has shape (num_gt,).
gt_bboxes_3d_list (list[Tensor]) – 3D Ground truth bboxes of each image, each has shape (num_gt, bbox_code_size).
gt_labels_3d_list (list[Tensor]) – 3D Ground truth labels of each box, each has shape (num_gt,).
centers2d_list (list[Tensor]) – Projected 3D centers onto 2D image, each has shape (num_gt, 2).
depths_list (list[Tensor]) – Depth of projected 3D centers onto 2D image, each has shape (num_gt, 1).
attr_labels_list (list[Tensor]) – Attribute labels of each box, each has shape (num_gt,).
- Returns
concat_lvl_labels (list[Tensor]): Labels of each level. concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level.
- Return type
tuple
- init_weights()[source]¶
Initialize weights of the head.
We currently still use the customized defined init_weights because the default init of DCN triggered by the init_cfg will init conv_offset.weight, which mistakenly affects the training stability.
- loss(cls_scores, bbox_preds, dir_cls_preds, depth_cls_preds, weights, attr_preds, centernesses, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels, img_metas, gt_bboxes_ignore=None)[source]¶
Compute loss of the head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
dir_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * 2. (bin = 2)
depth_cls_preds (list[Tensor]) – Box scores for direction class predictions on each scale level, each is a 4D-tensor, the channel number is num_points * self.num_depth_cls.
weights (list[Tensor]) – Location-aware weights for each scale level, each is a 4D-tensor, the channel number is num_points * self.weight_dim.
attr_preds (list[Tensor]) – Attribute scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_attrs.
centernesses (list[Tensor]) – Centerness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box
gt_bboxes_3d (list[Tensor]) – 3D boxes ground truth with shape of (num_gts, code_size).
gt_labels_3d (list[Tensor]) – same as gt_labels
centers2d (list[Tensor]) – 2D centers on the image with shape of (num_gts, 2).
depths (list[Tensor]) – Depth ground truth with shape of (num_gts, ).
attr_labels (list[Tensor]) – Attributes indices of each box.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (list[Tensor]) – specify which bounding boxes can be ignored when computing the loss. Defaults to None.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- class mmdet3d.models.dense_heads.PartA2RPNHead(num_classes, in_channels, train_cfg, test_cfg, feat_channels=256, use_direction_classifier=True, anchor_generator={'custom_values': [], 'range': [0, - 39.68, - 1.78, 69.12, 39.68, - 1.78], 'reshape_out': False, 'rotations': [0, 1.57], 'sizes': [[3.9, 1.6, 1.56]], 'strides': [2], 'type': 'Anchor3DRangeGenerator'}, assigner_per_size=False, assign_per_class=False, diff_rad_by_sin=True, dir_offset=- 1.5707963267948966, dir_limit_offset=0, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 0.2, 'type': 'CrossEntropyLoss'}, init_cfg=None)[source]¶
RPN head for PartA2.
Note
The main difference between the PartA2 RPN head and the Anchor3DHead lies in their output during inference. PartA2 RPN head further returns the original classification score for the second stage since the bbox head in RoI head does not do classification task.
Different from RPN heads in 2D detectors, this RPN head does multi-class classification task and uses FocalLoss like the SECOND and PointPillars do. But this head uses class agnostic nms rather than multi-class nms.
- Parameters
num_classes (int) – Number of classes.
in_channels (int) – Number of channels in the input feature map.
train_cfg (dict) – Train configs.
test_cfg (dict) – Test configs.
feat_channels (int) – Number of channels of the feature map.
use_direction_classifier (bool) – Whether to add a direction classifier.
anchor_generator (dict) – Config dict of anchor generator.
assigner_per_size (bool) – Whether to do assignment for each separate anchor size.
assign_per_class (bool) – Whether to do assignment for each class.
diff_rad_by_sin (bool) – Whether to change the difference into sin difference for box regression loss.
dir_offset (float | int) – The offset of BEV rotation angles (TODO: may be moved into box coder)
dir_limit_offset (float | int) – The limited range of BEV rotation angles. (TODO: may be moved into box coder)
bbox_coder (dict) – Config dict of box coders.
loss_cls (dict) – Config of classification loss.
loss_bbox (dict) – Config of localization loss.
loss_dir (dict) – Config of direction classifier loss.
- class_agnostic_nms(mlvl_bboxes, mlvl_bboxes_for_nms, mlvl_max_scores, mlvl_label_pred, mlvl_cls_score, mlvl_dir_scores, score_thr, max_num, cfg, input_meta)[source]¶
Class agnostic nms for single batch.
- Parameters
mlvl_bboxes (torch.Tensor) – Bboxes from Multi-level.
mlvl_bboxes_for_nms (torch.Tensor) – Bboxes for nms (bev or minmax boxes) from Multi-level.
mlvl_max_scores (torch.Tensor) – Max scores of Multi-level bbox.
mlvl_label_pred (torch.Tensor) – Class predictions of Multi-level bbox.
mlvl_cls_score (torch.Tensor) – Class scores of Multi-level bbox.
mlvl_dir_scores (torch.Tensor) – Direction scores of Multi-level bbox.
score_thr (int) – Score threshold.
max_num (int) – Max number of bboxes after nms.
cfg (
ConfigDict
) – Training or testing config.input_meta (dict) – Contain pcd and img’s meta info.
- Returns
Predictions of single batch. Contain the keys:
boxes_3d (
BaseInstance3DBoxes
): Predicted 3d bboxes.scores_3d (torch.Tensor): Score of each bbox.
labels_3d (torch.Tensor): Label of each bbox.
cls_preds (torch.Tensor): Class score of each bbox.
- Return type
dict
- get_bboxes_single(cls_scores, bbox_preds, dir_cls_preds, mlvl_anchors, input_meta, cfg, rescale=False)[source]¶
Get bboxes of single branch.
- Parameters
cls_scores (torch.Tensor) – Class score in single batch.
bbox_preds (torch.Tensor) – Bbox prediction in single batch.
dir_cls_preds (torch.Tensor) – Predictions of direction class in single batch.
mlvl_anchors (List[torch.Tensor]) – Multi-level anchors in single batch.
input_meta (list[dict]) – Contain pcd and img’s meta info.
cfg (
ConfigDict
) – Training or testing config.rescale (list[torch.Tensor]) – whether th rescale bbox.
- Returns
Predictions of single batch containing the following keys:
boxes_3d (
BaseInstance3DBoxes
): Predicted 3d bboxes.scores_3d (torch.Tensor): Score of each bbox.
labels_3d (torch.Tensor): Label of each bbox.
cls_preds (torch.Tensor): Class score of each bbox.
- Return type
dict
- loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]¶
Calculate losses.
- Parameters
cls_scores (list[torch.Tensor]) – Multi-level class scores.
bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.
gt_bboxes (list[
BaseInstance3DBoxes
]) – Ground truth boxes of each sample.gt_labels (list[torch.Tensor]) – Labels of each sample.
input_metas (list[dict]) – Point cloud and image’s meta info.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
- Returns
- Classification, bbox, and
direction losses of each level.
loss_rpn_cls (list[torch.Tensor]): Classification losses.
loss_rpn_bbox (list[torch.Tensor]): Box regression losses.
- loss_rpn_dir (list[torch.Tensor]): Direction classification
losses.
- Return type
dict[str, list[torch.Tensor]]
- class mmdet3d.models.dense_heads.PointRPNHead(num_classes, train_cfg, test_cfg, pred_layer_cfg=None, enlarge_width=0.1, cls_loss=None, bbox_loss=None, bbox_coder=None, init_cfg=None)[source]¶
RPN module for PointRCNN.
- Parameters
num_classes (int) – Number of classes.
train_cfg (dict) – Train configs.
test_cfg (dict) – Test configs.
pred_layer_cfg (dict, optional) – Config of classification and regression prediction layers. Defaults to None.
enlarge_width (float, optional) – Enlarge bbox for each side to ignore close points. Defaults to 0.1.
cls_loss (dict, optional) – Config of direction classification loss. Defaults to None.
bbox_loss (dict, optional) – Config of localization loss. Defaults to None.
bbox_coder (dict, optional) – Config dict of box coders. Defaults to None.
init_cfg (dict, optional) – Config of initialization. Defaults to None.
- class_agnostic_nms(obj_scores, sem_scores, bbox, points, input_meta)[source]¶
Class agnostic nms.
- Parameters
obj_scores (torch.Tensor) – Objectness score of bounding boxes.
sem_scores (torch.Tensor) – Semantic class score of bounding boxes.
bbox (torch.Tensor) – Predicted bounding boxes.
- Returns
Bounding boxes, scores and labels.
- Return type
tuple[torch.Tensor]
- forward(feat_dict)[source]¶
Forward pass.
- Parameters
feat_dict (dict) – Feature dict from backbone.
- Returns
- Predicted boxes and classification
scores.
- Return type
tuple[list[torch.Tensor]]
- get_bboxes(points, bbox_preds, cls_preds, input_metas, rescale=False)[source]¶
Generate bboxes from RPN head predictions.
- Parameters
points (torch.Tensor) – Input points.
bbox_preds (dict) – Regression predictions from PointRCNN head.
cls_preds (dict) – Class scores predictions from PointRCNN head.
input_metas (list[dict]) – Point cloud and image’s meta info.
rescale (bool, optional) – Whether to rescale bboxes. Defaults to False.
- Returns
Bounding boxes, scores and labels.
- Return type
list[tuple[torch.Tensor]]
- get_targets(points, gt_bboxes_3d, gt_labels_3d)[source]¶
Generate targets of PointRCNN RPN head.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – Labels of each batch.
- Returns
Targets of PointRCNN RPN head.
- Return type
tuple[torch.Tensor]
- get_targets_single(points, gt_bboxes_3d, gt_labels_3d)[source]¶
Generate targets of PointRCNN RPN head for single batch.
- Parameters
points (torch.Tensor) – Points of each batch.
gt_bboxes_3d (
BaseInstance3DBoxes
) – Ground truth boxes of each batch.gt_labels_3d (torch.Tensor) – Labels of each batch.
- Returns
Targets of ssd3d head.
- Return type
tuple[torch.Tensor]
- loss(bbox_preds, cls_preds, points, gt_bboxes_3d, gt_labels_3d, img_metas=None)[source]¶
Compute loss.
- Parameters
bbox_preds (dict) – Predictions from forward of PointRCNN RPN_Head.
cls_preds (dict) – Classification from forward of PointRCNN RPN_Head.
points (list[torch.Tensor]) – Input points.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each sample.gt_labels_3d (list[torch.Tensor]) – Labels of each sample.
img_metas (list[dict], Optional) – Contain pcd and img’s meta info. Defaults to None.
- Returns
Losses of PointRCNN RPN module.
- Return type
dict
- class mmdet3d.models.dense_heads.SMOKEMono3DHead(num_classes, in_channels, dim_channel, ori_channel, bbox_coder, loss_cls={'loss_weight': 1.0, 'type': 'GaussionFocalLoss'}, loss_bbox={'loss_weight': 0.1, 'type': 'L1Loss'}, loss_dir=None, loss_attr=None, norm_cfg={'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, init_cfg=None, **kwargs)[source]¶
Anchor-free head used in SMOKE
/-----> 3*3 conv -----> 1*1 conv -----> cls feature \-----> 3*3 conv -----> 1*1 conv -----> reg
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
dim_channel (list[int]) – indices of dimension offset preds in regression heatmap channels.
ori_channel (list[int]) – indices of orientation offset pred in regression heatmap channels.
bbox_coder (
CameraInstance3DBoxes
) – Bbox coder for encoding and decoding boxes.loss_cls (dict, optional) – Config of classification loss. Default: loss_cls=dict(type=’GaussionFocalLoss’, loss_weight=1.0).
loss_bbox (dict, optional) – Config of localization loss. Default: loss_bbox=dict(type=’L1Loss’, loss_weight=10.0).
loss_dir (dict, optional) – Config of direction classification loss. In SMOKE, Default: None.
loss_attr (dict, optional) – Config of attribute classification loss. In SMOKE, Default: None.
loss_centerness (dict) – Config of centerness loss.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: norm_cfg=dict(type=’GN’, num_groups=32, requires_grad=True).
init_cfg (dict) – Initialization config dict. Default: None.
- decode_heatmap(cls_score, reg_pred, img_metas, cam2imgs, trans_mats, topk=100, kernel=3)[source]¶
Transform outputs into detections raw bbox predictions.
- Parameters
class_score (Tensor) – Center predict heatmap, shape (B, num_classes, H, W).
reg_pred (Tensor) – Box regression map. shape (B, channel, H , W).
img_metas (List[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
cam2imgs (Tensor) – Camera intrinsic matrixs. shape (B, 4, 4)
trans_mats (Tensor) – Transformation matrix from original image to feature map. shape: (batch, 3, 3)
topk (int) – Get top k center keypoints from heatmap. Default 100.
kernel (int) – Max pooling kernel for extract local maximum pixels. Default 3.
- Returns
- Decoded output of SMOKEHead, containing
the following Tensors:
- batch_bboxes (Tensor): Coords of each 3D box.
shape (B, k, 7)
- batch_scores (Tensor): Scores of each 3D box.
shape (B, k)
- batch_topk_labels (Tensor): Categories of each 3D box.
shape (B, k)
- Return type
tuple[torch.Tensor]
- forward(feats)[source]¶
Forward features from the upstream network.
- Parameters
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
- cls_scores (list[Tensor]): Box scores for each scale level,
each is a 4D-tensor, the channel number is num_points * num_classes.
- bbox_preds (list[Tensor]): Box energies / deltas for each scale
level, each is a 4D-tensor, the channel number is num_points * bbox_code_size.
- Return type
tuple
- forward_single(x)[source]¶
Forward features of a single scale level.
- Parameters
x (Tensor) – Input feature map.
- Returns
Scores for each class, bbox of input feature maps.
- Return type
tuple
- get_bboxes(cls_scores, bbox_preds, img_metas, rescale=None)[source]¶
Generate bboxes from bbox head predictions.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level.
bbox_preds (list[Tensor]) – Box regression for each scale.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
rescale (bool) – If True, return boxes in original image space.
- Returns
Each item in result_list is 4-tuple.
- Return type
list[tuple[
CameraInstance3DBoxes
, Tensor, Tensor, None]]
- get_predictions(labels3d, centers2d, gt_locations, gt_dimensions, gt_orientations, indices, img_metas, pred_reg)[source]¶
Prepare predictions for computing loss.
- Parameters
labels3d (Tensor) – Labels of each 3D box. shape (B, max_objs, )
centers2d (Tensor) – Coords of each projected 3D box center on image. shape (B * max_objs, 2)
gt_locations (Tensor) – Coords of each 3D box’s location. shape (B * max_objs, 3)
gt_dimensions (Tensor) – Dimensions of each 3D box. shape (N, 3)
gt_orientations (Tensor) – Orientation(yaw) of each 3D box. shape (N, 1)
indices (Tensor) – Indices of the existence of the 3D box. shape (B * max_objs, )
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
pre_reg (Tensor) – Box regression map. shape (B, channel, H , W).
- Returns
the dict has components below: - bbox3d_yaws (
CameraInstance3DBoxes
):bbox calculated using pred orientations.
- bbox3d_dims (
CameraInstance3DBoxes
): bbox calculated using pred dimensions.
- bbox3d_dims (
- bbox3d_locs (
CameraInstance3DBoxes
): bbox calculated using pred locations.
- bbox3d_locs (
- Return type
dict
- get_targets(gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, feat_shape, img_shape, img_metas)[source]¶
Get training targets for batch images.
- Parameters
gt_bboxes (list[Tensor]) – Ground truth bboxes of each image, shape (num_gt, 4).
gt_labels (list[Tensor]) – Ground truth labels of each box, shape (num_gt,).
gt_bboxes_3d (list[
CameraInstance3DBoxes
]) – 3D Ground truth bboxes of each image, shape (num_gt, bbox_code_size).gt_labels_3d (list[Tensor]) – 3D Ground truth labels of each box, shape (num_gt,).
centers2d (list[Tensor]) – Projected 3D centers onto 2D image, shape (num_gt, 2).
feat_shape (tuple[int]) – Feature map shape with value, shape (B, _, H, W).
img_shape (tuple[int]) – Image shape in [h, w] format.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
- Returns
- The Tensor value is the targets of
center heatmap, the dict has components below:
- gt_centers2d (Tensor): Coords of each projected 3D box
center on image. shape (B * max_objs, 2)
- gt_labels3d (Tensor): Labels of each 3D box.
shape (B, max_objs, )
- indices (Tensor): Indices of the existence of the 3D box.
shape (B * max_objs, )
- affine_indices (Tensor): Indices of the affine of the 3D box.
shape (N, )
- gt_locs (Tensor): Coords of each 3D box’s location.
shape (N, 3)
- gt_dims (Tensor): Dimensions of each 3D box.
shape (N, 3)
- gt_yaws (Tensor): Orientation(yaw) of each 3D box.
shape (N, 1)
- gt_cors (Tensor): Coords of the corners of each 3D box.
shape (N, 8, 3)
- Return type
tuple[Tensor, dict]
- loss(cls_scores, bbox_preds, gt_bboxes, gt_labels, gt_bboxes_3d, gt_labels_3d, centers2d, depths, attr_labels, img_metas, gt_bboxes_ignore=None)[source]¶
Compute loss of the head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level. shape (num_gt, 4).
bbox_preds (list[Tensor]) – Box dims is a 4D-tensor, the channel number is bbox_code_size. shape (B, 7, H, W).
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image. shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – Class indices corresponding to each box. shape (num_gts, ).
gt_bboxes_3d (list[
CameraInstance3DBoxes
]) – 3D boxes ground truth. it is the flipped gt_bboxesgt_labels_3d (list[Tensor]) – Same as gt_labels.
centers2d (list[Tensor]) – 2D centers on the image. shape (num_gts, 2).
depths (list[Tensor]) – Depth ground truth. shape (num_gts, ).
attr_labels (list[Tensor]) – Attributes indices of each box. In kitti it’s None.
img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]) – Specify which bounding boxes can be ignored when computing the loss. Default: None.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- class mmdet3d.models.dense_heads.SSD3DHead(num_classes, bbox_coder, in_channels=256, train_cfg=None, test_cfg=None, vote_module_cfg=None, vote_aggregation_cfg=None, pred_layer_cfg=None, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, objectness_loss=None, center_loss=None, dir_class_loss=None, dir_res_loss=None, size_res_loss=None, corner_loss=None, vote_loss=None, init_cfg=None)[source]¶
Bbox head of 3DSSD.
- Parameters
num_classes (int) – The number of class.
bbox_coder (
BaseBBoxCoder
) – Bbox coder for encoding and decoding boxes.in_channels (int) – The number of input feature channel.
train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
vote_module_cfg (dict) – Config of VoteModule for point-wise votes.
vote_aggregation_cfg (dict) – Config of vote aggregation layer.
pred_layer_cfg (dict) – Config of classfication and regression prediction layers.
conv_cfg (dict) – Config of convolution in prediction layer.
norm_cfg (dict) – Config of BN in prediction layer.
act_cfg (dict) – Config of activation in prediction layer.
objectness_loss (dict) – Config of objectness loss.
center_loss (dict) – Config of center loss.
dir_class_loss (dict) – Config of direction classification loss.
dir_res_loss (dict) – Config of direction residual regression loss.
size_res_loss (dict) – Config of size residual regression loss.
corner_loss (dict) – Config of bbox corners regression loss.
vote_loss (dict) – Config of candidate points regression loss.
- get_bboxes(points, bbox_preds, input_metas, rescale=False)[source]¶
Generate bboxes from 3DSSD head predictions.
- Parameters
points (torch.Tensor) – Input points.
bbox_preds (dict) – Predictions from sdd3d head.
input_metas (list[dict]) – Point cloud and image’s meta info.
rescale (bool) – Whether to rescale bboxes.
- Returns
Bounding boxes, scores and labels.
- Return type
list[tuple[torch.Tensor]]
- get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None)[source]¶
Generate targets of ssd3d head.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – Labels of each batch.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance label of each batch.
bbox_preds (torch.Tensor) – Bounding box predictions of ssd3d head.
- Returns
Targets of ssd3d head.
- Return type
tuple[torch.Tensor]
- get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, aggregated_points=None, seed_points=None)[source]¶
Generate targets of ssd3d head for single batch.
- Parameters
points (torch.Tensor) – Points of each batch.
gt_bboxes_3d (
BaseInstance3DBoxes
) – Ground truth boxes of each batch.gt_labels_3d (torch.Tensor) – Labels of each batch.
pts_semantic_mask (torch.Tensor) – Point-wise semantic label of each batch.
pts_instance_mask (torch.Tensor) – Point-wise instance label of each batch.
aggregated_points (torch.Tensor) – Aggregated points from candidate points layer.
seed_points (torch.Tensor) – Seed points of candidate points.
- Returns
Targets of ssd3d head.
- Return type
tuple[torch.Tensor]
- loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, gt_bboxes_ignore=None)[source]¶
Compute loss.
- Parameters
bbox_preds (dict) – Predictions from forward of SSD3DHead.
points (list[torch.Tensor]) – Input points.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each sample.gt_labels_3d (list[torch.Tensor]) – Labels of each sample.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic mask.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance mask.
img_metas (list[dict]) – Contain pcd and img’s meta info.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
- Returns
Losses of 3DSSD.
- Return type
dict
- multiclass_nms_single(obj_scores, sem_scores, bbox, points, input_meta)[source]¶
Multi-class nms in single batch.
- Parameters
obj_scores (torch.Tensor) – Objectness score of bounding boxes.
sem_scores (torch.Tensor) – Semantic class score of bounding boxes.
bbox (torch.Tensor) – Predicted bounding boxes.
points (torch.Tensor) – Input points.
input_meta (dict) – Point cloud and image’s meta info.
- Returns
Bounding boxes, scores and labels.
- Return type
tuple[torch.Tensor]
- class mmdet3d.models.dense_heads.ShapeAwareHead(tasks, assign_per_class=True, init_cfg=None, **kwargs)[source]¶
Shape-aware grouping head for SSN.
- Parameters
tasks (dict) – Shape-aware groups of multi-class objects.
assign_per_class (bool, optional) – Whether to do assignment for each class. Default: True.
kwargs (dict) – Other arguments are the same as those in
Anchor3DHead
.
- forward_single(x)[source]¶
Forward function on a single-scale feature map.
- Parameters
x (torch.Tensor) – Input features.
- Returns
- Contain score of each class, bbox
regression and direction classification predictions.
- Return type
tuple[torch.Tensor]
- get_bboxes(cls_scores, bbox_preds, dir_cls_preds, input_metas, cfg=None, rescale=False)[source]¶
Get bboxes of anchor head.
- Parameters
cls_scores (list[torch.Tensor]) – Multi-level class scores.
bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.
input_metas (list[dict]) – Contain pcd and img’s meta info.
cfg (
ConfigDict
, optional) – Training or testing config. Default: None.rescale (list[torch.Tensor], optional) – Whether to rescale bbox. Default: False.
- Returns
Prediction resultes of batches.
- Return type
list[tuple]
- get_bboxes_single(cls_scores, bbox_preds, dir_cls_preds, mlvl_anchors, input_meta, cfg=None, rescale=False)[source]¶
Get bboxes of single branch.
- Parameters
cls_scores (torch.Tensor) – Class score in single batch.
bbox_preds (torch.Tensor) – Bbox prediction in single batch.
dir_cls_preds (torch.Tensor) – Predictions of direction class in single batch.
mlvl_anchors (List[torch.Tensor]) – Multi-level anchors in single batch.
input_meta (list[dict]) – Contain pcd and img’s meta info.
cfg (
ConfigDict
) – Training or testing config.rescale (list[torch.Tensor], optional) – whether to rescale bbox. Default: False.
- Returns
Contain predictions of single batch.
bboxes (
BaseInstance3DBoxes
): Predicted 3d bboxes.scores (torch.Tensor): Class score of each bbox.
labels (torch.Tensor): Label of each bbox.
- Return type
tuple
- loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]¶
Calculate losses.
- Parameters
cls_scores (list[torch.Tensor]) – Multi-level class scores.
bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.
gt_bboxes (list[
BaseInstance3DBoxes
]) – Gt bboxes of each sample.gt_labels (list[torch.Tensor]) – Gt labels of each sample.
input_metas (list[dict]) – Contain pcd and img’s meta info.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
- Returns
- Classification, bbox, and
direction losses of each level.
loss_cls (list[torch.Tensor]): Classification losses.
loss_bbox (list[torch.Tensor]): Box regression losses.
- loss_dir (list[torch.Tensor]): Direction classification
losses.
- Return type
dict[str, list[torch.Tensor]]
- loss_single(cls_score, bbox_pred, dir_cls_preds, labels, label_weights, bbox_targets, bbox_weights, dir_targets, dir_weights, num_total_samples)[source]¶
Calculate loss of Single-level results.
- Parameters
cls_score (torch.Tensor) – Class score in single-level.
bbox_pred (torch.Tensor) – Bbox prediction in single-level.
dir_cls_preds (torch.Tensor) – Predictions of direction class in single-level.
labels (torch.Tensor) – Labels of class.
label_weights (torch.Tensor) – Weights of class loss.
bbox_targets (torch.Tensor) – Targets of bbox predictions.
bbox_weights (torch.Tensor) – Weights of bbox loss.
dir_targets (torch.Tensor) – Targets of direction predictions.
dir_weights (torch.Tensor) – Weights of direction loss.
num_total_samples (int) – The number of valid samples.
- Returns
- Losses of class, bbox
and direction, respectively.
- Return type
tuple[torch.Tensor]
- class mmdet3d.models.dense_heads.VoteHead(num_classes, bbox_coder, train_cfg=None, test_cfg=None, vote_module_cfg=None, vote_aggregation_cfg=None, pred_layer_cfg=None, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, objectness_loss=None, center_loss=None, dir_class_loss=None, dir_res_loss=None, size_class_loss=None, size_res_loss=None, semantic_loss=None, iou_loss=None, init_cfg=None)[source]¶
Bbox head of Votenet.
- Parameters
num_classes (int) – The number of class.
bbox_coder (
BaseBBoxCoder
) – Bbox coder for encoding and decoding boxes.train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
vote_module_cfg (dict) – Config of VoteModule for point-wise votes.
vote_aggregation_cfg (dict) – Config of vote aggregation layer.
pred_layer_cfg (dict) – Config of classfication and regression prediction layers.
conv_cfg (dict) – Config of convolution in prediction layer.
norm_cfg (dict) – Config of BN in prediction layer.
objectness_loss (dict) – Config of objectness loss.
center_loss (dict) – Config of center loss.
dir_class_loss (dict) – Config of direction classification loss.
dir_res_loss (dict) – Config of direction residual regression loss.
size_class_loss (dict) – Config of size classification loss.
size_res_loss (dict) – Config of size residual regression loss.
semantic_loss (dict) – Config of point-wise semantic segmentation loss.
- forward(feat_dict, sample_mod)[source]¶
Forward pass.
Note
The forward of VoteHead is divided into 4 steps:
Generate vote_points from seed_points.
Aggregate vote_points.
Predict bbox and score.
Decode predictions.
- Parameters
feat_dict (dict) – Feature dict from backbone.
sample_mod (str) – Sample mode for vote aggregation layer. valid modes are “vote”, “seed”, “random” and “spec”.
- Returns
Predictions of vote head.
- Return type
dict
- get_bboxes(points, bbox_preds, input_metas, rescale=False, use_nms=True)[source]¶
Generate bboxes from vote head predictions.
- Parameters
points (torch.Tensor) – Input points.
bbox_preds (dict) – Predictions from vote head.
input_metas (list[dict]) – Point cloud and image’s meta info.
rescale (bool) – Whether to rescale bboxes.
use_nms (bool) – Whether to apply NMS, skip nms postprocessing while using vote head in rpn stage.
- Returns
Bounding boxes, scores and labels.
- Return type
list[tuple[torch.Tensor]]
- get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None)[source]¶
Generate targets of vote head.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – Labels of each batch.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance label of each batch.
bbox_preds (torch.Tensor) – Bounding box predictions of vote head.
- Returns
Targets of vote head.
- Return type
tuple[torch.Tensor]
- get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, aggregated_points=None)[source]¶
Generate targets of vote head for single batch.
- Parameters
points (torch.Tensor) – Points of each batch.
gt_bboxes_3d (
BaseInstance3DBoxes
) – Ground truth boxes of each batch.gt_labels_3d (torch.Tensor) – Labels of each batch.
pts_semantic_mask (torch.Tensor) – Point-wise semantic label of each batch.
pts_instance_mask (torch.Tensor) – Point-wise instance label of each batch.
aggregated_points (torch.Tensor) – Aggregated points from vote aggregation layer.
- Returns
Targets of vote head.
- Return type
tuple[torch.Tensor]
- loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, gt_bboxes_ignore=None, ret_target=False)[source]¶
Compute loss.
- Parameters
bbox_preds (dict) – Predictions from forward of vote head.
points (list[torch.Tensor]) – Input points.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each sample.gt_labels_3d (list[torch.Tensor]) – Labels of each sample.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic mask.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance mask.
img_metas (list[dict]) – Contain pcd and img’s meta info.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
ret_target (Bool) – Return targets or not.
- Returns
Losses of Votenet.
- Return type
dict
- multiclass_nms_single(obj_scores, sem_scores, bbox, points, input_meta)[source]¶
Multi-class nms in single batch.
- Parameters
obj_scores (torch.Tensor) – Objectness score of bounding boxes.
sem_scores (torch.Tensor) – semantic class score of bounding boxes.
bbox (torch.Tensor) – Predicted bounding boxes.
points (torch.Tensor) – Input points.
input_meta (dict) – Point cloud and image’s meta info.
- Returns
Bounding boxes, scores and labels.
- Return type
tuple[torch.Tensor]
roi_heads¶
- class mmdet3d.models.roi_heads.Base3DRoIHead(bbox_head=None, mask_roi_extractor=None, mask_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
Base class for 3d RoIHeads.
- aug_test(x, proposal_list, img_metas, rescale=False, **kwargs)[source]¶
Test with augmentations.
If rescale is False, then returned bboxes and masks will fit the scale of imgs[0].
- abstract forward_train(x, img_metas, proposal_list, gt_bboxes, gt_labels, gt_bboxes_ignore=None, **kwargs)[source]¶
Forward function during training.
- Parameters
x (dict) – Contains features from the first stage.
img_metas (list[dict]) – Meta info of each image.
proposal_list (list[dict]) – Proposal information from rpn.
gt_bboxes (list[
BaseInstance3DBoxes
]) – GT bboxes of each sample. The bboxes are encapsulated by 3D box structures.gt_labels (list[torch.LongTensor]) – GT labels of each sample.
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored.
- Returns
Losses from each head.
- Return type
dict[str, torch.Tensor]
- simple_test(x, proposal_list, img_metas, proposals=None, rescale=False, **kwargs)[source]¶
Test without augmentation.
- property with_bbox¶
whether the RoIHead has box head
- Type
bool
- property with_mask¶
whether the RoIHead has mask head
- Type
bool
- class mmdet3d.models.roi_heads.H3DBboxHead(num_classes, suface_matching_cfg, line_matching_cfg, bbox_coder, train_cfg=None, test_cfg=None, gt_per_seed=1, num_proposal=256, feat_channels=(128, 128), primitive_feat_refine_streams=2, primitive_refine_channels=[128, 128, 128], upper_thresh=100.0, surface_thresh=0.5, line_thresh=0.5, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, objectness_loss=None, center_loss=None, dir_class_loss=None, dir_res_loss=None, size_class_loss=None, size_res_loss=None, semantic_loss=None, cues_objectness_loss=None, cues_semantic_loss=None, proposal_objectness_loss=None, primitive_center_loss=None, init_cfg=None)[source]¶
Bbox head of H3DNet.
- Parameters
num_classes (int) – The number of classes.
surface_matching_cfg (dict) – Config for surface primitive matching.
line_matching_cfg (dict) – Config for line primitive matching.
bbox_coder (
BaseBBoxCoder
) – Bbox coder for encoding and decoding boxes.train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
gt_per_seed (int) – Number of ground truth votes generated from each seed point.
num_proposal (int) – Number of proposal votes generated.
feat_channels (tuple[int]) – Convolution channels of prediction layer.
primitive_feat_refine_streams (int) – The number of mlps to refine primitive feature.
primitive_refine_channels (tuple[int]) – Convolution channels of prediction layer.
upper_thresh (float) – Threshold for line matching.
surface_thresh (float) – Threshold for surface matching.
line_thresh (float) – Threshold for line matching.
conv_cfg (dict) – Config of convolution in prediction layer.
norm_cfg (dict) – Config of BN in prediction layer.
objectness_loss (dict) – Config of objectness loss.
center_loss (dict) – Config of center loss.
dir_class_loss (dict) – Config of direction classification loss.
dir_res_loss (dict) – Config of direction residual regression loss.
size_class_loss (dict) – Config of size classification loss.
size_res_loss (dict) – Config of size residual regression loss.
semantic_loss (dict) – Config of point-wise semantic segmentation loss.
cues_objectness_loss (dict) – Config of cues objectness loss.
cues_semantic_loss (dict) – Config of cues semantic loss.
proposal_objectness_loss (dict) – Config of proposal objectness loss.
primitive_center_loss (dict) – Config of primitive center regression loss.
- forward(feats_dict, sample_mod)[source]¶
Forward pass.
- Parameters
feats_dict (dict) – Feature dict from backbone.
sample_mod (str) – Sample mode for vote aggregation layer. valid modes are “vote”, “seed” and “random”.
- Returns
Predictions of vote head.
- Return type
dict
- get_bboxes(points, bbox_preds, input_metas, rescale=False, suffix='')[source]¶
Generate bboxes from vote head predictions.
- Parameters
points (torch.Tensor) – Input points.
bbox_preds (dict) – Predictions from vote head.
input_metas (list[dict]) – Point cloud and image’s meta info.
rescale (bool) – Whether to rescale bboxes.
- Returns
Bounding boxes, scores and labels.
- Return type
list[tuple[torch.Tensor]]
- get_proposal_stage_loss(bbox_preds, size_class_targets, size_res_targets, dir_class_targets, dir_res_targets, center_targets, mask_targets, objectness_targets, objectness_weights, box_loss_weights, valid_gt_weights, suffix='')[source]¶
Compute loss for the aggregation module.
- Parameters
bbox_preds (dict) – Predictions from forward of vote head.
size_class_targets (torch.Tensor) – Ground truth size class of each prediction bounding box.
size_res_targets (torch.Tensor) – Ground truth size residual of each prediction bounding box.
dir_class_targets (torch.Tensor) – Ground truth direction class of each prediction bounding box.
dir_res_targets (torch.Tensor) – Ground truth direction residual of each prediction bounding box.
center_targets (torch.Tensor) – Ground truth center of each prediction bounding box.
mask_targets (torch.Tensor) – Validation of each prediction bounding box.
objectness_targets (torch.Tensor) – Ground truth objectness label of each prediction bounding box.
objectness_weights (torch.Tensor) – Weights of objectness loss for each prediction bounding box.
box_loss_weights (torch.Tensor) – Weights of regression loss for each prediction bounding box.
valid_gt_weights (torch.Tensor) – Validation of each ground truth bounding box.
- Returns
Losses of aggregation module.
- Return type
dict
- get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None)[source]¶
Generate targets of proposal module.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – Labels of each batch.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance label of each batch.
bbox_preds (torch.Tensor) – Bounding box predictions of vote head.
- Returns
Targets of proposal module.
- Return type
tuple[torch.Tensor]
- get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, aggregated_points=None, pred_surface_center=None, pred_line_center=None, pred_obj_surface_center=None, pred_obj_line_center=None, pred_surface_sem=None, pred_line_sem=None)[source]¶
Generate targets for primitive cues for single batch.
- Parameters
points (torch.Tensor) – Points of each batch.
gt_bboxes_3d (
BaseInstance3DBoxes
) – Ground truth boxes of each batch.gt_labels_3d (torch.Tensor) – Labels of each batch.
pts_semantic_mask (torch.Tensor) – Point-wise semantic label of each batch.
pts_instance_mask (torch.Tensor) – Point-wise instance label of each batch.
aggregated_points (torch.Tensor) – Aggregated points from vote aggregation layer.
pred_surface_center (torch.Tensor) – Prediction of surface center.
pred_line_center (torch.Tensor) – Prediction of line center.
pred_obj_surface_center (torch.Tensor) – Objectness prediction of surface center.
pred_obj_line_center (torch.Tensor) – Objectness prediction of line center.
pred_surface_sem (torch.Tensor) – Semantic prediction of surface center.
pred_line_sem (torch.Tensor) – Semantic prediction of line center.
- Returns
Targets for primitive cues.
- Return type
tuple[torch.Tensor]
- loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, rpn_targets=None, gt_bboxes_ignore=None)[source]¶
Compute loss.
- Parameters
bbox_preds (dict) – Predictions from forward of h3d bbox head.
points (list[torch.Tensor]) – Input points.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each sample.gt_labels_3d (list[torch.Tensor]) – Labels of each sample.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic mask.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance mask.
img_metas (list[dict]) – Contain pcd and img’s meta info.
rpn_targets (Tuple) – Targets generated by rpn head.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
- Returns
Losses of H3dnet.
- Return type
dict
- multiclass_nms_single(obj_scores, sem_scores, bbox, points, input_meta)[source]¶
Multi-class nms in single batch.
- Parameters
obj_scores (torch.Tensor) – Objectness score of bounding boxes.
sem_scores (torch.Tensor) – semantic class score of bounding boxes.
bbox (torch.Tensor) – Predicted bounding boxes.
points (torch.Tensor) – Input points.
input_meta (dict) – Point cloud and image’s meta info.
- Returns
Bounding boxes, scores and labels.
- Return type
tuple[torch.Tensor]
- class mmdet3d.models.roi_heads.H3DRoIHead(primitive_list, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
H3D roi head for H3DNet.
- Parameters
primitive_list (List) – Configs of primitive heads.
bbox_head (ConfigDict) – Config of bbox_head.
train_cfg (ConfigDict) – Training config.
test_cfg (ConfigDict) – Testing config.
- forward_train(feats_dict, img_metas, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask, pts_instance_mask, gt_bboxes_ignore=None)[source]¶
Training forward function of PartAggregationROIHead.
- Parameters
feats_dict (dict) – Contains features from the first stage.
img_metas (list[dict]) – Contain pcd and img’s meta info.
points (list[torch.Tensor]) – Input points.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each sample.gt_labels_3d (list[torch.Tensor]) – Labels of each sample.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic mask.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance mask.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding boxes to ignore.
- Returns
losses from each head.
- Return type
dict
- simple_test(feats_dict, img_metas, points, rescale=False)[source]¶
Simple testing forward function of PartAggregationROIHead.
Note
This function assumes that the batch size is 1
- Parameters
feats_dict (dict) – Contains features from the first stage.
img_metas (list[dict]) – Contain pcd and img’s meta info.
points (torch.Tensor) – Input points.
rescale (bool) – Whether to rescale results.
- Returns
Bbox results of one frame.
- Return type
dict
- class mmdet3d.models.roi_heads.PartA2BboxHead(num_classes, seg_in_channels, part_in_channels, seg_conv_channels=None, part_conv_channels=None, merge_conv_channels=None, down_conv_channels=None, shared_fc_channels=None, cls_channels=None, reg_channels=None, dropout_ratio=0.1, roi_feat_size=14, with_corner_loss=True, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, conv_cfg={'type': 'Conv1d'}, norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN1d'}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_cls={'loss_weight': 1.0, 'reduction': 'none', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg=None)[source]¶
PartA2 RoI head.
- Parameters
num_classes (int) – The number of classes to prediction.
seg_in_channels (int) – Input channels of segmentation convolution layer.
part_in_channels (int) – Input channels of part convolution layer.
seg_conv_channels (list(int)) – Out channels of each segmentation convolution layer.
part_conv_channels (list(int)) – Out channels of each part convolution layer.
merge_conv_channels (list(int)) – Out channels of each feature merged convolution layer.
down_conv_channels (list(int)) – Out channels of each downsampled convolution layer.
shared_fc_channels (list(int)) – Out channels of each shared fc layer.
cls_channels (list(int)) – Out channels of each classification layer.
reg_channels (list(int)) – Out channels of each regression layer.
dropout_ratio (float) – Dropout ratio of classification and regression layers.
roi_feat_size (int) – The size of pooled roi features.
with_corner_loss (bool) – Whether to use corner loss or not.
bbox_coder (
BaseBBoxCoder
) – Bbox coder for box head.conv_cfg (dict) – Config dict of convolutional layers
norm_cfg (dict) – Config dict of normalization layers
loss_bbox (dict) – Config dict of box regression loss.
loss_cls (dict) – Config dict of classifacation loss.
- forward(seg_feats, part_feats)[source]¶
Forward pass.
- Parameters
seg_feats (torch.Tensor) – Point-wise semantic features.
part_feats (torch.Tensor) – Point-wise part prediction features.
- Returns
Score of class and bbox predictions.
- Return type
tuple[torch.Tensor]
- get_bboxes(rois, cls_score, bbox_pred, class_labels, class_pred, img_metas, cfg=None)[source]¶
Generate bboxes from bbox head predictions.
- Parameters
rois (torch.Tensor) – Roi bounding boxes.
cls_score (torch.Tensor) – Scores of bounding boxes.
bbox_pred (torch.Tensor) – Bounding boxes predictions
class_labels (torch.Tensor) – Label of classes
class_pred (torch.Tensor) – Score for nms.
img_metas (list[dict]) – Point cloud and image’s meta info.
cfg (
ConfigDict
) – Testing config.
- Returns
Decoded bbox, scores and labels after nms.
- Return type
list[tuple]
- get_corner_loss_lidar(pred_bbox3d, gt_bbox3d, delta=1.0)[source]¶
Calculate corner loss of given boxes.
- Parameters
pred_bbox3d (torch.FloatTensor) – Predicted boxes in shape (N, 7).
gt_bbox3d (torch.FloatTensor) – Ground truth boxes in shape (N, 7).
delta (float, optional) – huber loss threshold. Defaults to 1.0
- Returns
Calculated corner loss in shape (N).
- Return type
torch.FloatTensor
- get_targets(sampling_results, rcnn_train_cfg, concat=True)[source]¶
Generate targets.
- Parameters
sampling_results (list[
SamplingResult
]) – Sampled results from rois.rcnn_train_cfg (
ConfigDict
) – Training config of rcnn.concat (bool) – Whether to concatenate targets between batches.
- Returns
Targets of boxes and class prediction.
- Return type
tuple[torch.Tensor]
- loss(cls_score, bbox_pred, rois, labels, bbox_targets, pos_gt_bboxes, reg_mask, label_weights, bbox_weights)[source]¶
Computing losses.
- Parameters
cls_score (torch.Tensor) – Scores of each roi.
bbox_pred (torch.Tensor) – Predictions of bboxes.
rois (torch.Tensor) – Roi bboxes.
labels (torch.Tensor) – Labels of class.
bbox_targets (torch.Tensor) – Target of positive bboxes.
pos_gt_bboxes (torch.Tensor) – Ground truths of positive bboxes.
reg_mask (torch.Tensor) – Mask for positive bboxes.
label_weights (torch.Tensor) – Weights of class loss.
bbox_weights (torch.Tensor) – Weights of bbox loss.
- Returns
Computed losses.
loss_cls (torch.Tensor): Loss of classes.
loss_bbox (torch.Tensor): Loss of bboxes.
loss_corner (torch.Tensor): Loss of corners.
- Return type
dict
- multi_class_nms(box_probs, box_preds, score_thr, nms_thr, input_meta, use_rotate_nms=True)[source]¶
Multi-class NMS for box head.
Note
This function has large overlap with the box3d_multiclass_nms implemented in mmdet3d.core.post_processing. We are considering merging these two functions in the future.
- Parameters
box_probs (torch.Tensor) – Predicted boxes probabitilies in shape (N,).
box_preds (torch.Tensor) – Predicted boxes in shape (N, 7+C).
score_thr (float) – Threshold of scores.
nms_thr (float) – Threshold for NMS.
input_meta (dict) – Meta information of the current sample.
use_rotate_nms (bool, optional) – Whether to use rotated nms. Defaults to True.
- Returns
Selected indices.
- Return type
torch.Tensor
- class mmdet3d.models.roi_heads.PartAggregationROIHead(semantic_head, num_classes=3, seg_roi_extractor=None, part_roi_extractor=None, bbox_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[source]¶
Part aggregation roi head for PartA2.
- Parameters
semantic_head (ConfigDict) – Config of semantic head.
num_classes (int) – The number of classes.
seg_roi_extractor (ConfigDict) – Config of seg_roi_extractor.
part_roi_extractor (ConfigDict) – Config of part_roi_extractor.
bbox_head (ConfigDict) – Config of bbox_head.
train_cfg (ConfigDict) – Training config.
test_cfg (ConfigDict) – Testing config.
- forward_train(feats_dict, voxels_dict, img_metas, proposal_list, gt_bboxes_3d, gt_labels_3d)[source]¶
Training forward function of PartAggregationROIHead.
- Parameters
feats_dict (dict) – Contains features from the first stage.
voxels_dict (dict) – Contains information of voxels.
img_metas (list[dict]) – Meta info of each image.
proposal_list (list[dict]) –
Proposal information from rpn. The dictionary should contain the following keys:
boxes_3d (
BaseInstance3DBoxes
): Proposal bboxeslabels_3d (torch.Tensor): Labels of proposals
cls_preds (torch.Tensor): Original scores of proposals
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – GT bboxes of each sample. The bboxes are encapsulated by 3D box structures.gt_labels_3d (list[LongTensor]) – GT labels of each sample.
- Returns
losses from each head.
loss_semantic (torch.Tensor): loss of semantic head
loss_bbox (torch.Tensor): loss of bboxes
- Return type
dict
- init_mask_head()[source]¶
Initialize mask head, skip since
PartAggregationROIHead
does not have one.
- simple_test(feats_dict, voxels_dict, img_metas, proposal_list, **kwargs)[source]¶
Simple testing forward function of PartAggregationROIHead.
Note
This function assumes that the batch size is 1
- Parameters
feats_dict (dict) – Contains features from the first stage.
voxels_dict (dict) – Contains information of voxels.
img_metas (list[dict]) – Meta info of each image.
proposal_list (list[dict]) – Proposal information from rpn.
- Returns
Bbox results of one frame.
- Return type
dict
- property with_semantic¶
whether the head has semantic branch
- Type
bool
- class mmdet3d.models.roi_heads.PointRCNNBboxHead(num_classes, in_channels, mlp_channels, pred_layer_cfg=None, num_points=(128, 32, - 1), radius=(0.2, 0.4, 100), num_samples=(64, 64, 64), sa_channels=((128, 128, 128), (128, 128, 256), (256, 256, 512)), bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, sa_cfg={'pool_mod': 'max', 'type': 'PointSAModule', 'use_xyz': True}, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, bias='auto', loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'SmoothL1Loss'}, loss_cls={'loss_weight': 1.0, 'reduction': 'sum', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, with_corner_loss=True, init_cfg=None)[source]¶
PointRCNN RoI Bbox head.
- Parameters
num_classes (int) – The number of classes to prediction.
in_channels (int) –
mlp_channels (list[int]) – the number of mlp channels
pred_layer_cfg (dict, optional) – Config of classfication and regression prediction layers. Defaults to None.
num_points (tuple, optional) – The number of points which each SA module samples. Defaults to (128, 32, -1).
radius (tuple, optional) – Sampling radius of each SA module. Defaults to (0.2, 0.4, 100).
num_samples (tuple, optional) – The number of samples for ball query in each SA module. Defaults to (64, 64, 64).
sa_channels (tuple, optional) – Out channels of each mlp in SA module. Defaults to ((128, 128, 128), (128, 128, 256), (256, 256, 512)).
bbox_coder (dict, optional) – Config dict of box coders. Defaults to dict(type=’DeltaXYZWLHRBBoxCoder’).
sa_cfg (dict, optional) –
Config of set abstraction module, which may contain the following keys and values:
pool_mod (str): Pool method (‘max’ or ‘avg’) for SA modules.
use_xyz (bool): Whether to use xyz as a part of features.
normalize_xyz (bool): Whether to normalize xyz with radii in each SA module.
- Defaults to dict(type=’PointSAModule’, pool_mod=’max’,
use_xyz=True).
conv_cfg (dict, optional) – Config dict of convolutional layers. Defaults to dict(type=’Conv1d’).
norm_cfg (dict, optional) – Config dict of normalization layers. Defaults to dict(type=’BN1d’).
act_cfg (dict, optional) – Config dict of activation layers. Defaults to dict(type=’ReLU’).
bias (str, optional) – Type of bias. Defaults to ‘auto’.
loss_bbox (dict, optional) –
Config of regression loss function. Defaults to dict(type=’SmoothL1Loss’, beta=1.0 / 9.0,
reduction=’sum’, loss_weight=1.0).
loss_cls (dict, optional) –
Config of classification loss function. Defaults to dict(type=’CrossEntropyLoss’, use_sigmoid=True,
reduction=’sum’, loss_weight=1.0).
with_corner_loss (bool, optional) – Whether using corner loss. Defaults to True.
init_cfg (dict, optional) – Config of initialization. Defaults to None.
- forward(feats)[source]¶
Forward pass.
- Parameters
feats (torch.Torch) – Features from RCNN modules.
- Returns
Score of class and bbox predictions.
- Return type
tuple[torch.Tensor]
- get_bboxes(rois, cls_score, bbox_pred, class_labels, img_metas, cfg=None)[source]¶
Generate bboxes from bbox head predictions.
- Parameters
rois (torch.Tensor) – RoI bounding boxes.
cls_score (torch.Tensor) – Scores of bounding boxes.
bbox_pred (torch.Tensor) – Bounding boxes predictions
class_labels (torch.Tensor) – Label of classes
img_metas (list[dict]) – Point cloud and image’s meta info.
cfg (
ConfigDict
, optional) – Testing config. Defaults to None.
- Returns
Decoded bbox, scores and labels after nms.
- Return type
list[tuple]
- get_corner_loss_lidar(pred_bbox3d, gt_bbox3d, delta=1.0)[source]¶
Calculate corner loss of given boxes.
- Parameters
pred_bbox3d (torch.FloatTensor) – Predicted boxes in shape (N, 7).
gt_bbox3d (torch.FloatTensor) – Ground truth boxes in shape (N, 7).
delta (float, optional) – huber loss threshold. Defaults to 1.0
- Returns
Calculated corner loss in shape (N).
- Return type
torch.FloatTensor
- get_targets(sampling_results, rcnn_train_cfg, concat=True)[source]¶
Generate targets.
- Parameters
sampling_results (list[
SamplingResult
]) – Sampled results from rois.rcnn_train_cfg (
ConfigDict
) – Training config of rcnn.concat (bool, optional) – Whether to concatenate targets between batches. Defaults to True.
- Returns
Targets of boxes and class prediction.
- Return type
tuple[torch.Tensor]
- loss(cls_score, bbox_pred, rois, labels, bbox_targets, pos_gt_bboxes, reg_mask, label_weights, bbox_weights)[source]¶
Computing losses.
- Parameters
cls_score (torch.Tensor) – Scores of each RoI.
bbox_pred (torch.Tensor) – Predictions of bboxes.
rois (torch.Tensor) – RoI bboxes.
labels (torch.Tensor) – Labels of class.
bbox_targets (torch.Tensor) – Target of positive bboxes.
pos_gt_bboxes (torch.Tensor) – Ground truths of positive bboxes.
reg_mask (torch.Tensor) – Mask for positive bboxes.
label_weights (torch.Tensor) – Weights of class loss.
bbox_weights (torch.Tensor) – Weights of bbox loss.
- Returns
Computed losses.
loss_cls (torch.Tensor): Loss of classes.
loss_bbox (torch.Tensor): Loss of bboxes.
loss_corner (torch.Tensor): Loss of corners.
- Return type
dict
- multi_class_nms(box_probs, box_preds, score_thr, nms_thr, input_meta, use_rotate_nms=True)[source]¶
Multi-class NMS for box head.
Note
This function has large overlap with the box3d_multiclass_nms implemented in mmdet3d.core.post_processing. We are considering merging these two functions in the future.
- Parameters
box_probs (torch.Tensor) – Predicted boxes probabilities in shape (N,).
box_preds (torch.Tensor) – Predicted boxes in shape (N, 7+C).
score_thr (float) – Threshold of scores.
nms_thr (float) – Threshold for NMS.
input_meta (dict) – Meta information of the current sample.
use_rotate_nms (bool, optional) – Whether to use rotated nms. Defaults to True.
- Returns
Selected indices.
- Return type
torch.Tensor
- class mmdet3d.models.roi_heads.PointRCNNRoIHead(bbox_head, point_roi_extractor, train_cfg, test_cfg, depth_normalizer=70.0, pretrained=None, init_cfg=None)[source]¶
RoI head for PointRCNN.
- Parameters
bbox_head (dict) – Config of bbox_head.
point_roi_extractor (dict) – Config of RoI extractor.
train_cfg (dict) – Train configs.
test_cfg (dict) – Test configs.
depth_normalizer (float, optional) – Normalize depth feature. Defaults to 70.0.
init_cfg (dict, optional) – Config of initialization. Defaults to None.
- forward_train(feats_dict, input_metas, proposal_list, gt_bboxes_3d, gt_labels_3d)[source]¶
Training forward function of PointRCNNRoIHead.
- Parameters
feats_dict (dict) – Contains features from the first stage.
imput_metas (list[dict]) – Meta info of each input.
proposal_list (list[dict]) –
Proposal information from rpn. The dictionary should contain the following keys:
boxes_3d (
BaseInstance3DBoxes
): Proposal bboxeslabels_3d (torch.Tensor): Labels of proposals
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – GT bboxes of each sample. The bboxes are encapsulated by 3D box structures.gt_labels_3d (list[LongTensor]) – GT labels of each sample.
- Returns
- Losses from RoI RCNN head.
loss_bbox (torch.Tensor): Loss of bboxes
- Return type
dict
- init_bbox_head(bbox_head)[source]¶
Initialize box head.
- Parameters
bbox_head (dict) – Config dict of RoI Head.
- simple_test(feats_dict, img_metas, proposal_list, **kwargs)[source]¶
Simple testing forward function of PointRCNNRoIHead.
Note
This function assumes that the batch size is 1
- Parameters
feats_dict (dict) – Contains features from the first stage.
img_metas (list[dict]) – Meta info of each image.
proposal_list (list[dict]) – Proposal information from rpn.
- Returns
Bbox results of one frame.
- Return type
dict
- class mmdet3d.models.roi_heads.PointwiseSemanticHead(in_channels, num_classes=3, extra_width=0.2, seg_score_thr=0.3, init_cfg=None, loss_seg={'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'FocalLoss', 'use_sigmoid': True}, loss_part={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True})[source]¶
Semantic segmentation head for point-wise segmentation.
Predict point-wise segmentation and part regression results for PartA2. See paper for more details.
- Parameters
in_channels (int) – The number of input channel.
num_classes (int) – The number of class.
extra_width (float) – Boxes enlarge width.
loss_seg (dict) – Config of segmentation loss.
loss_part (dict) – Config of part prediction loss.
- forward(x)[source]¶
Forward pass.
- Parameters
x (torch.Tensor) – Features from the first stage.
- Returns
Part features, segmentation and part predictions.
seg_preds (torch.Tensor): Segment predictions.
part_preds (torch.Tensor): Part predictions.
part_feats (torch.Tensor): Feature predictions.
- Return type
dict
- get_targets(voxels_dict, gt_bboxes_3d, gt_labels_3d)[source]¶
generate segmentation and part prediction targets.
- Parameters
voxel_centers (torch.Tensor) – The center of voxels in shape (voxel_num, 3).
gt_bboxes_3d (
BaseInstance3DBoxes
) – Ground truth boxes in shape (box_num, 7).gt_labels_3d (torch.Tensor) – Class labels of ground truths in shape (box_num).
- Returns
Prediction targets
- seg_targets (torch.Tensor): Segmentation targets
with shape [voxel_num].
- part_targets (torch.Tensor): Part prediction targets
with shape [voxel_num, 3].
- Return type
dict
- get_targets_single(voxel_centers, gt_bboxes_3d, gt_labels_3d)[source]¶
generate segmentation and part prediction targets for a single sample.
- Parameters
voxel_centers (torch.Tensor) – The center of voxels in shape (voxel_num, 3).
gt_bboxes_3d (
BaseInstance3DBoxes
) – Ground truth boxes in shape (box_num, 7).gt_labels_3d (torch.Tensor) – Class labels of ground truths in shape (box_num).
- Returns
- Segmentation targets with shape [voxel_num]
part prediction targets with shape [voxel_num, 3]
- Return type
tuple[torch.Tensor]
- loss(semantic_results, semantic_targets)[source]¶
Calculate point-wise segmentation and part prediction losses.
- Parameters
semantic_results (dict) –
Results from semantic head.
seg_preds: Segmentation predictions.
part_preds: Part predictions.
semantic_targets (dict) –
Targets of semantic results.
seg_preds: Segmentation targets.
part_preds: Part targets.
- Returns
Loss of segmentation and part prediction.
loss_seg (torch.Tensor): Segmentation prediction loss.
loss_part (torch.Tensor): Part prediction loss.
- Return type
dict
- class mmdet3d.models.roi_heads.PrimitiveHead(num_dims, num_classes, primitive_mode, train_cfg=None, test_cfg=None, vote_module_cfg=None, vote_aggregation_cfg=None, feat_channels=(128, 128), upper_thresh=100.0, surface_thresh=0.5, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, objectness_loss=None, center_loss=None, semantic_reg_loss=None, semantic_cls_loss=None, init_cfg=None)[source]¶
Primitive head of H3DNet.
- Parameters
num_dims (int) – The dimension of primitive semantic information.
num_classes (int) – The number of class.
primitive_mode (str) – The mode of primitive module, available mode [‘z’, ‘xy’, ‘line’].
bbox_coder (
BaseBBoxCoder
) – Bbox coder for encoding and decoding boxes.train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
vote_module_cfg (dict) – Config of VoteModule for point-wise votes.
vote_aggregation_cfg (dict) – Config of vote aggregation layer.
feat_channels (tuple[int]) – Convolution channels of prediction layer.
upper_thresh (float) – Threshold for line matching.
surface_thresh (float) – Threshold for surface matching.
conv_cfg (dict) – Config of convolution in prediction layer.
norm_cfg (dict) – Config of BN in prediction layer.
objectness_loss (dict) – Config of objectness loss.
center_loss (dict) – Config of center loss.
semantic_loss (dict) – Config of point-wise semantic segmentation loss.
- check_dist(plane_equ, points)[source]¶
Whether the mean of points to plane distance is lower than thresh.
- Parameters
plane_equ (torch.Tensor) – Plane to be checked.
points (torch.Tensor) – Points to be checked.
- Returns
Flag of result.
- Return type
Tuple
- check_horizon(points)[source]¶
Check whether is a horizontal plane.
- Parameters
points (torch.Tensor) – Points of input.
- Returns
Flag of result.
- Return type
Bool
- compute_primitive_loss(primitive_center, primitive_semantic, semantic_scores, num_proposal, gt_primitive_center, gt_primitive_semantic, gt_sem_cls_label, gt_primitive_mask)[source]¶
Compute loss of primitive module.
- Parameters
primitive_center (torch.Tensor) – Pridictions of primitive center.
primitive_semantic (torch.Tensor) – Pridictions of primitive semantic.
semantic_scores (torch.Tensor) – Pridictions of primitive semantic scores.
num_proposal (int) – The number of primitive proposal.
gt_primitive_center (torch.Tensor) – Ground truth of primitive center.
gt_votes_sem (torch.Tensor) – Ground truth of primitive semantic.
gt_sem_cls_label (torch.Tensor) – Ground truth of primitive semantic class.
gt_primitive_mask (torch.Tensor) – Ground truth of primitive mask.
- Returns
Loss of primitive module.
- Return type
Tuple
- forward(feats_dict, sample_mod)[source]¶
Forward pass.
- Parameters
feats_dict (dict) – Feature dict from backbone.
sample_mod (str) – Sample mode for vote aggregation layer. valid modes are “vote”, “seed” and “random”.
- Returns
Predictions of primitive head.
- Return type
dict
- get_primitive_center(pred_flag, center)[source]¶
Generate primitive center from predictions.
- Parameters
pred_flag (torch.Tensor) – Scores of primitive center.
center (torch.Tensor) – Pridictions of primitive center.
- Returns
Primitive center and the prediction indices.
- Return type
Tuple
- get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None)[source]¶
Generate targets of primitive head.
- Parameters
points (list[torch.Tensor]) – Points of each batch.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each batch.gt_labels_3d (list[torch.Tensor]) – Labels of each batch.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic label of each batch.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance label of each batch.
bbox_preds (dict) – Predictions from forward of primitive head.
- Returns
Targets of primitive head.
- Return type
tuple[torch.Tensor]
- get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None)[source]¶
Generate targets of primitive head for single batch.
- Parameters
points (torch.Tensor) – Points of each batch.
gt_bboxes_3d (
BaseInstance3DBoxes
) – Ground truth boxes of each batch.gt_labels_3d (torch.Tensor) – Labels of each batch.
pts_semantic_mask (torch.Tensor) – Point-wise semantic label of each batch.
pts_instance_mask (torch.Tensor) – Point-wise instance label of each batch.
- Returns
Targets of primitive head.
- Return type
tuple[torch.Tensor]
- loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, gt_bboxes_ignore=None)[source]¶
Compute loss.
- Parameters
bbox_preds (dict) – Predictions from forward of primitive head.
points (list[torch.Tensor]) – Input points.
gt_bboxes_3d (list[
BaseInstance3DBoxes
]) – Ground truth bboxes of each sample.gt_labels_3d (list[torch.Tensor]) – Labels of each sample.
pts_semantic_mask (list[torch.Tensor]) – Point-wise semantic mask.
pts_instance_mask (list[torch.Tensor]) – Point-wise instance mask.
img_metas (list[dict]) – Contain pcd and img’s meta info.
gt_bboxes_ignore (list[torch.Tensor]) – Specify which bounding.
- Returns
Losses of Primitive Head.
- Return type
dict
- match_point2line(points, corners, with_yaw, mode='bottom')[source]¶
Match points to corresponding line.
- Parameters
points (torch.Tensor) – Points of input.
corners (torch.Tensor) – Eight corners of a bounding box.
with_yaw (Bool) – Whether the boundind box is with rotation.
mode (str, optional) – Specify which line should be matched, available mode are (‘bottom’, ‘top’, ‘left’, ‘right’). Defaults to ‘bottom’.
- Returns
Flag of matching correspondence.
- Return type
Tuple
- match_point2plane(plane, points)[source]¶
Match points to plane.
- Parameters
plane (torch.Tensor) – Equation of the plane.
points (torch.Tensor) – Points of input.
- Returns
- Distance of each point to the plane and
flag of matching correspondence.
- Return type
Tuple
- point2line_dist(points, pts_a, pts_b)[source]¶
Calculate the distance from point to line.
- Parameters
points (torch.Tensor) – Points of input.
pts_a (torch.Tensor) – Point on the specific line.
pts_b (torch.Tensor) – Point on the specific line.
- Returns
Distance between each point to line.
- Return type
torch.Tensor
- primitive_decode_scores(predictions, aggregated_points)[source]¶
Decode predicted parts to primitive head.
- Parameters
predictions (torch.Tensor) – primitive pridictions of each batch.
aggregated_points (torch.Tensor) – The aggregated points of vote stage.
- Returns
- Predictions of primitive head, including center,
semantic size and semantic scores.
- Return type
Dict
- class mmdet3d.models.roi_heads.Single3DRoIAwareExtractor(roi_layer=None, init_cfg=None)[source]¶
Point-wise roi-aware Extractor.
Extract Point-wise roi features.
- Parameters
roi_layer (dict) – The config of roi layer.
- forward(feats, coordinate, batch_inds, rois)[source]¶
Extract point-wise roi features.
- Parameters
feats (torch.FloatTensor) – Point-wise features with shape (batch, npoints, channels) for pooling.
coordinate (torch.FloatTensor) – Coordinate of each point.
batch_inds (torch.LongTensor) – Indicate the batch of each point.
rois (torch.FloatTensor) – Roi boxes with batch indices.
- Returns
Pooled features
- Return type
torch.FloatTensor
- class mmdet3d.models.roi_heads.Single3DRoIPointExtractor(roi_layer=None)[source]¶
Point-wise roi-aware Extractor.
Extract Point-wise roi features.
- Parameters
roi_layer (dict) – The config of roi layer.
- forward(feats, coordinate, batch_inds, rois)[source]¶
Extract point-wise roi features.
- Parameters
feats (torch.FloatTensor) – Point-wise features with shape (batch, npoints, channels) for pooling.
coordinate (torch.FloatTensor) – Coordinate of each point.
batch_inds (torch.LongTensor) – Indicate the batch of each point.
rois (torch.FloatTensor) – Roi boxes with batch indices.
- Returns
Pooled features
- Return type
torch.FloatTensor
- class mmdet3d.models.roi_heads.SingleRoIExtractor(roi_layer, out_channels, featmap_strides, finest_scale=56, init_cfg=None)[source]¶
Extract RoI features from a single level feature map.
If there are multiple input feature levels, each RoI is mapped to a level according to its scale. The mapping rule is proposed in FPN.
- Parameters
roi_layer (dict) – Specify RoI layer type and arguments.
out_channels (int) – Output channels of RoI layers.
featmap_strides (List[int]) – Strides of input feature maps.
finest_scale (int) – Scale threshold of mapping to level 0. Default: 56.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- map_roi_levels(rois, num_levels)[source]¶
Map rois to corresponding feature levels by scales.
scale < finest_scale * 2: level 0
finest_scale * 2 <= scale < finest_scale * 4: level 1
finest_scale * 4 <= scale < finest_scale * 8: level 2
scale >= finest_scale * 8: level 3
- Parameters
rois (Tensor) – Input RoIs, shape (k, 5).
num_levels (int) – Total level number.
- Returns
Level index (0-based) of each RoI, shape (k, )
- Return type
Tensor
fusion_layers¶
- class mmdet3d.models.fusion_layers.PointFusion(img_channels, pts_channels, mid_channels, out_channels, img_levels=3, coord_type='LIDAR', conv_cfg=None, norm_cfg=None, act_cfg=None, init_cfg=None, activate_out=True, fuse_out=False, dropout_ratio=0, aligned=True, align_corners=True, padding_mode='zeros', lateral_conv=True)[source]¶
Fuse image features from multi-scale features.
- Parameters
img_channels (list[int] | int) – Channels of image features. It could be a list if the input is multi-scale image features.
pts_channels (int) – Channels of point features
mid_channels (int) – Channels of middle layers
out_channels (int) – Channels of output fused features
img_levels (int, optional) – Number of image levels. Defaults to 3.
coord_type (str) – ‘DEPTH’ or ‘CAMERA’ or ‘LIDAR’. Defaults to ‘LIDAR’.
conv_cfg (dict, optional) – Dict config of conv layers of middle layers. Defaults to None.
norm_cfg (dict, optional) – Dict config of norm layers of middle layers. Defaults to None.
act_cfg (dict, optional) – Dict config of activatation layers. Defaults to None.
activate_out (bool, optional) – Whether to apply relu activation to output features. Defaults to True.
fuse_out (bool, optional) – Whether apply conv layer to the fused features. Defaults to False.
dropout_ratio (int, float, optional) – Dropout ratio of image features to prevent overfitting. Defaults to 0.
aligned (bool, optional) – Whether apply aligned feature fusion. Defaults to True.
align_corners (bool, optional) – Whether to align corner when sampling features according to points. Defaults to True.
padding_mode (str, optional) – Mode used to pad the features of points that do not have corresponding image features. Defaults to ‘zeros’.
lateral_conv (bool, optional) – Whether to apply lateral convs to image features. Defaults to True.
- forward(img_feats, pts, pts_feats, img_metas)[source]¶
Forward function.
- Parameters
img_feats (list[torch.Tensor]) – Image features.
pts – [list[torch.Tensor]]: A batch of points with shape N x 3.
pts_feats (torch.Tensor) – A tensor consist of point features of the total batch.
img_metas (list[dict]) – Meta information of images.
- Returns
Fused features of each point.
- Return type
torch.Tensor
- obtain_mlvl_feats(img_feats, pts, img_metas)[source]¶
Obtain multi-level features for each point.
- Parameters
img_feats (list(torch.Tensor)) – Multi-scale image features produced by image backbone in shape (N, C, H, W).
pts (list[torch.Tensor]) – Points of each sample.
img_metas (list[dict]) – Meta information for each sample.
- Returns
Corresponding image features of each point.
- Return type
torch.Tensor
- sample_single(img_feats, pts, img_meta)[source]¶
Sample features from single level image feature map.
- Parameters
img_feats (torch.Tensor) – Image feature map in shape (1, C, H, W).
pts (torch.Tensor) – Points of a single sample.
img_meta (dict) – Meta information of the single sample.
- Returns
Single level image features of each point.
- Return type
torch.Tensor
- class mmdet3d.models.fusion_layers.VoteFusion(num_classes=10, max_imvote_per_pixel=3)[source]¶
Fuse 2d features from 3d seeds.
- Parameters
num_classes (int) – number of classes.
max_imvote_per_pixel (int) – max number of imvotes.
- forward(imgs, bboxes_2d_rescaled, seeds_3d_depth, img_metas)[source]¶
Forward function.
- Parameters
imgs (list[torch.Tensor]) – Image features.
bboxes_2d_rescaled (list[torch.Tensor]) – 2D bboxes.
seeds_3d_depth (torch.Tensor) – 3D seeds.
img_metas (list[dict]) – Meta information of images.
- Returns
Concatenated cues of each point. torch.Tensor: Validity mask of each feature.
- Return type
torch.Tensor
- mmdet3d.models.fusion_layers.apply_3d_transformation(pcd, coord_type, img_meta, reverse=False)[source]¶
Apply transformation to input point cloud.
- Parameters
pcd (torch.Tensor) – The point cloud to be transformed.
coord_type (str) – ‘DEPTH’ or ‘CAMERA’ or ‘LIDAR’.
img_meta (dict) – Meta info regarding data transformation.
reverse (bool) – Reversed transformation or not.
Note
The elements in img_meta[‘transformation_3d_flow’]: “T” stands for translation; “S” stands for scale; “R” stands for rotation; “HF” stands for horizontal flip; “VF” stands for vertical flip.
- Returns
The transformed point cloud.
- Return type
torch.Tensor
- mmdet3d.models.fusion_layers.bbox_2d_transform(img_meta, bbox_2d, ori2new)[source]¶
Transform 2d bbox according to img_meta.
- Parameters
img_meta (dict) – Meta info regarding data transformation.
bbox_2d (torch.Tensor) – Shape (…, >4) The input 2d bboxes to transform.
ori2new (bool) – Origin img coord system to new or not.
- Returns
The transformed 2d bboxes.
- Return type
torch.Tensor
- mmdet3d.models.fusion_layers.coord_2d_transform(img_meta, coord_2d, ori2new)[source]¶
Transform 2d pixel coordinates according to img_meta.
- Parameters
img_meta (dict) – Meta info regarding data transformation.
coord_2d (torch.Tensor) – Shape (…, 2) The input 2d coords to transform.
ori2new (bool) – Origin img coord system to new or not.
- Returns
The transformed 2d coordinates.
- Return type
torch.Tensor
losses¶
- class mmdet3d.models.losses.AxisAlignedIoULoss(reduction='mean', loss_weight=1.0)[source]¶
Calculate the IoU loss (1-IoU) of axis aligned bounding boxes.
- Parameters
reduction (str) – Method to reduce losses. The valid reduction method are none, sum or mean.
loss_weight (float, optional) – Weight of loss. Defaults to 1.0.
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function of loss calculation.
- Parameters
pred (torch.Tensor) – Bbox predictions with shape […, 6] (x1, y1, z1, x2, y2, z2).
target (torch.Tensor) – Bbox targets (gt) with shape […, 6] (x1, y1, z1, x2, y2, z2).
weight (torch.Tensor | float, optional) – Weight of loss. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – Method to reduce losses. The valid reduction method are ‘none’, ‘sum’ or ‘mean’. Defaults to None.
- Returns
IoU loss between predictions and targets.
- Return type
torch.Tensor
- class mmdet3d.models.losses.ChamferDistance(mode='l2', reduction='mean', loss_src_weight=1.0, loss_dst_weight=1.0)[source]¶
Calculate Chamfer Distance of two sets.
- Parameters
mode (str) – Criterion mode to calculate distance. The valid modes are smooth_l1, l1 or l2.
reduction (str) – Method to reduce losses. The valid reduction method are none, sum or mean.
loss_src_weight (float) – Weight of loss_source.
loss_dst_weight (float) – Weight of loss_target.
- forward(source, target, src_weight=1.0, dst_weight=1.0, reduction_override=None, return_indices=False, **kwargs)[source]¶
Forward function of loss calculation.
- Parameters
source (torch.Tensor) – Source set with shape [B, N, C] to calculate Chamfer Distance.
target (torch.Tensor) – Destination set with shape [B, M, C] to calculate Chamfer Distance.
src_weight (torch.Tensor | float, optional) – Weight of source loss. Defaults to 1.0.
dst_weight (torch.Tensor | float, optional) – Weight of destination loss. Defaults to 1.0.
reduction_override (str, optional) – Method to reduce losses. The valid reduction method are ‘none’, ‘sum’ or ‘mean’. Defaults to None.
return_indices (bool, optional) – Whether to return indices. Defaults to False.
- Returns
- If
return_indices=True
, return losses of source and target with their corresponding indices in the order of
(loss_source, loss_target, indices1, indices2)
. Ifreturn_indices=False
, return(loss_source, loss_target)
.
- If
- Return type
tuple[torch.Tensor]
- class mmdet3d.models.losses.FocalLoss(use_sigmoid=True, gamma=2.0, alpha=0.25, reduction='mean', loss_weight=1.0, activated=False)[source]¶
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None)[source]¶
Forward function.
- Parameters
pred (torch.Tensor) – The prediction.
target (torch.Tensor) – The learning label of the prediction.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.
- Returns
The calculated loss
- Return type
torch.Tensor
- class mmdet3d.models.losses.MultiBinLoss(reduction='none', loss_weight=1.0)[source]¶
Multi-Bin Loss for orientation.
- Parameters
reduction (str, optional) – The method to reduce the loss. Options are ‘none’, ‘mean’ and ‘sum’. Defaults to ‘none’.
loss_weight (float, optional) – The weight of loss. Defaults to 1.0.
- forward(pred, target, num_dir_bins, reduction_override=None)[source]¶
Forward function.
- Parameters
pred (torch.Tensor) – The prediction.
target (torch.Tensor) – The learning target of the prediction.
num_dir_bins (int) – Number of bins to encode direction angle.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- class mmdet3d.models.losses.PAConvRegularizationLoss(reduction='mean', loss_weight=1.0)[source]¶
Calculate correlation loss of kernel weights in PAConv’s weight bank.
This is used as a regularization term in PAConv model training.
- Parameters
reduction (str) – Method to reduce losses. The reduction is performed among all PAConv modules instead of prediction tensors. The valid reduction method are none, sum or mean.
loss_weight (float, optional) – Weight of loss. Defaults to 1.0.
- forward(modules, reduction_override=None, **kwargs)[source]¶
Forward function of loss calculation.
- Parameters
modules (List[nn.Module] |
generator
) – A list or a python generator of torch.nn.Modules.reduction_override (str, optional) – Method to reduce losses. The valid reduction method are ‘none’, ‘sum’ or ‘mean’. Defaults to None.
- Returns
Correlation loss of kernel weights.
- Return type
torch.Tensor
- class mmdet3d.models.losses.RotatedIoU3DLoss(reduction='mean', loss_weight=1.0)[source]¶
Calculate the IoU loss (1-IoU) of rotated bounding boxes.
- Parameters
reduction (str) – Method to reduce losses. The valid reduction method are none, sum or mean.
loss_weight (float, optional) – Weight of loss. Defaults to 1.0.
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function of loss calculation.
- Parameters
pred (torch.Tensor) – Bbox predictions with shape […, 7] (x, y, z, w, l, h, alpha).
target (torch.Tensor) – Bbox targets (gt) with shape […, 7] (x, y, z, w, l, h, alpha).
weight (torch.Tensor | float, optional) – Weight of loss. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – Method to reduce losses. The valid reduction method are ‘none’, ‘sum’ or ‘mean’. Defaults to None.
- Returns
IoU loss between predictions and targets.
- Return type
torch.Tensor
- class mmdet3d.models.losses.SmoothL1Loss(beta=1.0, reduction='mean', loss_weight=1.0)[source]¶
Smooth L1 loss.
- Parameters
beta (float, optional) – The threshold in the piecewise function. Defaults to 1.0.
reduction (str, optional) – The method to reduce the loss. Options are “none”, “mean” and “sum”. Defaults to “mean”.
loss_weight (float, optional) – The weight of loss.
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- Parameters
pred (torch.Tensor) – The prediction.
target (torch.Tensor) – The learning target of the prediction.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- class mmdet3d.models.losses.UncertainL1Loss(alpha=1.0, reduction='mean', loss_weight=1.0)[source]¶
L1 loss with uncertainty.
- Parameters
alpha (float, optional) – The coefficient of log(sigma). Defaults to 1.0.
reduction (str, optional) – The method to reduce the loss. Options are ‘none’, ‘mean’ and ‘sum’. Defaults to ‘mean’.
loss_weight (float, optional) – The weight of loss. Defaults to 1.0.
- forward(pred, target, sigma, weight=None, avg_factor=None, reduction_override=None)[source]¶
Forward function.
- Parameters
pred (torch.Tensor) – The prediction.
target (torch.Tensor) – The learning target of the prediction.
sigma (torch.Tensor) – The sigma for uncertainty.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- class mmdet3d.models.losses.UncertainSmoothL1Loss(alpha=1.0, beta=1.0, reduction='mean', loss_weight=1.0)[source]¶
Smooth L1 loss with uncertainty.
Please refer to PGD and Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics for more details.
- Parameters
alpha (float, optional) – The coefficient of log(sigma). Defaults to 1.0.
beta (float, optional) – The threshold in the piecewise function. Defaults to 1.0.
reduction (str, optional) – The method to reduce the loss. Options are ‘none’, ‘mean’ and ‘sum’. Defaults to ‘mean’.
loss_weight (float, optional) – The weight of loss. Defaults to 1.0
- forward(pred, target, sigma, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- Parameters
pred (torch.Tensor) – The prediction.
target (torch.Tensor) – The learning target of the prediction.
sigma (torch.Tensor) – The sigma for uncertainty.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Defaults to None.
- mmdet3d.models.losses.axis_aligned_iou_loss(pred, target)[source]¶
Calculate the IoU loss (1-IoU) of two sets of axis aligned bounding boxes. Note that predictions and targets are one-to-one corresponded.
- Parameters
pred (torch.Tensor) – Bbox predictions with shape […, 6] (x1, y1, z1, x2, y2, z2).
target (torch.Tensor) – Bbox targets (gt) with shape […, 6] (x1, y1, z1, x2, y2, z2).
- Returns
IoU loss between predictions and targets.
- Return type
torch.Tensor
- mmdet3d.models.losses.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None, ignore_index=- 100, avg_non_ignore=False)[source]¶
Calculate the binary CrossEntropy loss.
- Parameters
pred (torch.Tensor) – The prediction with shape (N, 1) or (N, ). When the shape of pred is (N, 1), label will be expanded to one-hot format, and when the shape of pred is (N, ), label will not be expanded to one-hot format.
label (torch.Tensor) – The learning label of the prediction, with shape (N, ).
weight (torch.Tensor, optional) – Sample-wise loss weight.
reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
class_weight (list[float], optional) – The weight for each class.
ignore_index (int | None) – The label index to be ignored. If None, it will be set to default value. Default: -100.
avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False.
- Returns
The calculated loss.
- Return type
torch.Tensor
- mmdet3d.models.losses.chamfer_distance(src, dst, src_weight=1.0, dst_weight=1.0, criterion_mode='l2', reduction='mean')[source]¶
Calculate Chamfer Distance of two sets.
- Parameters
src (torch.Tensor) – Source set with shape [B, N, C] to calculate Chamfer Distance.
dst (torch.Tensor) – Destination set with shape [B, M, C] to calculate Chamfer Distance.
src_weight (torch.Tensor or float) – Weight of source loss.
dst_weight (torch.Tensor or float) – Weight of destination loss.
criterion_mode (str) – Criterion mode to calculate distance. The valid modes are smooth_l1, l1 or l2.
reduction (str) – Method to reduce losses. The valid reduction method are ‘none’, ‘sum’ or ‘mean’.
- Returns
Source and Destination loss with the corresponding indices.
- loss_src (torch.Tensor): The min distance
from source to destination.
- loss_dst (torch.Tensor): The min distance
from destination to source.
- indices1 (torch.Tensor): Index the min distance point
for each point in source to destination.
- indices2 (torch.Tensor): Index the min distance point
for each point in destination to source.
- Return type
tuple
middle_encoders¶
- class mmdet3d.models.middle_encoders.PointPillarsScatter(in_channels, output_shape)[source]¶
Point Pillar’s Scatter.
Converts learned features from dense tensor to sparse pseudo image.
- Parameters
in_channels (int) – Channels of input features.
output_shape (list[int]) – Required output shape of features.
- forward_batch(voxel_features, coors, batch_size)[source]¶
Scatter features of single sample.
- Parameters
voxel_features (torch.Tensor) – Voxel features in shape (N, C).
coors (torch.Tensor) – Coordinates of each voxel in shape (N, 4). The first column indicates the sample ID.
batch_size (int) – Number of samples in the current batch.
- class mmdet3d.models.middle_encoders.SparseEncoder(in_channels, sparse_shape, order=('conv', 'norm', 'act'), norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN1d'}, base_channels=16, output_channels=128, encoder_channels=((16), (32, 32, 32), (64, 64, 64), (64, 64, 64)), encoder_paddings=((1), (1, 1, 1), (1, 1, 1), ((0, 1, 1), 1, 1)), block_type='conv_module')[source]¶
Sparse encoder for SECOND and Part-A2.
- Parameters
in_channels (int) – The number of input channels.
sparse_shape (list[int]) – The sparse shape of input tensor.
order (list[str], optional) – Order of conv module. Defaults to (‘conv’, ‘norm’, ‘act’).
norm_cfg (dict, optional) – Config of normalization layer. Defaults to dict(type=’BN1d’, eps=1e-3, momentum=0.01).
base_channels (int, optional) – Out channels for conv_input layer. Defaults to 16.
output_channels (int, optional) – Out channels for conv_out layer. Defaults to 128.
encoder_channels (tuple[tuple[int]], optional) – Convolutional channels of each encode block. Defaults to ((16, ), (32, 32, 32), (64, 64, 64), (64, 64, 64)).
encoder_paddings (tuple[tuple[int]], optional) – Paddings of each encode block. Defaults to ((1, ), (1, 1, 1), (1, 1, 1), ((0, 1, 1), 1, 1)).
block_type (str, optional) – Type of the block to use. Defaults to ‘conv_module’.
- forward(voxel_features, coors, batch_size)[source]¶
Forward of SparseEncoder.
- Parameters
voxel_features (torch.Tensor) – Voxel features in shape (N, C).
coors (torch.Tensor) – Coordinates in shape (N, 4), the columns in the order of (batch_idx, z_idx, y_idx, x_idx).
batch_size (int) – Batch size.
- Returns
Backbone features.
- Return type
dict
- make_encoder_layers(make_block, norm_cfg, in_channels, block_type='conv_module', conv_cfg={'type': 'SubMConv3d'})[source]¶
make encoder layers using sparse convs.
- Parameters
make_block (method) – A bounded function to build blocks.
norm_cfg (dict[str]) – Config of normalization layer.
in_channels (int) – The number of encoder input channels.
block_type (str, optional) – Type of the block to use. Defaults to ‘conv_module’.
conv_cfg (dict, optional) – Config of conv layer. Defaults to dict(type=’SubMConv3d’).
- Returns
The number of encoder output channels.
- Return type
int
- class mmdet3d.models.middle_encoders.SparseEncoderSASSD(in_channels, sparse_shape, order=('conv', 'norm', 'act'), norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN1d'}, base_channels=16, output_channels=128, encoder_channels=((16), (32, 32, 32), (64, 64, 64), (64, 64, 64)), encoder_paddings=((1), (1, 1, 1), (1, 1, 1), ((0, 1, 1), 1, 1)), block_type='conv_module')[source]¶
Sparse encoder for SASSD
- Parameters
in_channels (int) – The number of input channels.
sparse_shape (list[int]) – The sparse shape of input tensor.
order (list[str], optional) – Order of conv module. Defaults to (‘conv’, ‘norm’, ‘act’).
norm_cfg (dict, optional) – Config of normalization layer. Defaults to dict(type=’BN1d’, eps=1e-3, momentum=0.01).
base_channels (int, optional) – Out channels for conv_input layer. Defaults to 16.
output_channels (int, optional) – Out channels for conv_out layer. Defaults to 128.
encoder_channels (tuple[tuple[int]], optional) – Convolutional channels of each encode block. Defaults to ((16, ), (32, 32, 32), (64, 64, 64), (64, 64, 64)).
encoder_paddings (tuple[tuple[int]], optional) – Paddings of each encode block. Defaults to ((1, ), (1, 1, 1), (1, 1, 1), ((0, 1, 1), 1, 1)).
block_type (str, optional) – Type of the block to use. Defaults to ‘conv_module’.
- aux_loss(points, point_cls, point_reg, gt_bboxes)[source]¶
Calculate auxiliary loss.
- Parameters
points (torch.Tensor) – Mean feature value of the points.
point_cls (torch.Tensor) – Classificaion result of the points.
point_reg (torch.Tensor) – Regression offsets of the points.
gt_bboxes (list[
BaseInstance3DBoxes
]) – Ground truth boxes for each sample.
- Returns
Backbone features.
- Return type
dict
- calculate_pts_offsets(points, boxes)[source]¶
Find all boxes in which each point is, as well as the offsets from the box centers.
- Parameters
points (torch.Tensor) – [M, 3], [x, y, z] in LiDAR/DEPTH coordinate
boxes (torch.Tensor) – [T, 7], num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz], (x, y, z) is the bottom center.
- Returns
- Point indices of boxes with the shape of
(T, M). Default background = 0. And offsets from the box centers of points, if it belows to the box, with the shape of (M, 3). Default background = 0.
- Return type
tuple[torch.Tensor]
- forward(voxel_features, coors, batch_size, test_mode=False)[source]¶
Forward of SparseEncoder.
- Parameters
voxel_features (torch.Tensor) – Voxel features in shape (N, C).
coors (torch.Tensor) – Coordinates in shape (N, 4), the columns in the order of (batch_idx, z_idx, y_idx, x_idx).
batch_size (int) – Batch size.
test_mode (bool, optional) – Whether in test mode. Defaults to False.
- Returns
Backbone features. tuple[torch.Tensor]: Mean feature value of the points,
Classificaion result of the points, Regression offsets of the points.
- Return type
dict
- get_auxiliary_targets(nxyz, gt_boxes3d, enlarge=1.0)[source]¶
Get auxiliary target.
- Parameters
nxyz (torch.Tensor) – Mean features of the points.
gt_boxes3d (torch.Tensor) – Coordinates in shape (N, 4), the columns in the order of (batch_idx, z_idx, y_idx, x_idx).
enlarge (int, optional) – Enlaged scale. Defaults to 1.0.
- Returns
- Label of the points and
center offsets of the points.
- Return type
tuple[torch.Tensor]
- make_auxiliary_points(source_tensor, target, offset=(0.0, - 40.0, - 3.0), voxel_size=(0.05, 0.05, 0.1))[source]¶
Make auxiliary points for loss computation.
- Parameters
source_tensor (torch.Tensor) – (M, C) features to be propigated.
target (torch.Tensor) – (N, 4) bxyz positions of the target features.
offset (tuple[float], optional) – Voxelization offset. Defaults to (0., -40., -3.)
voxel_size (tuple[float], optional) – Voxelization size. Defaults to (.05, .05, .1)
- Returns
(N, C) tensor of the features of the target features.
- Return type
torch.Tensor
- class mmdet3d.models.middle_encoders.SparseUNet(in_channels, sparse_shape, order=('conv', 'norm', 'act'), norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN1d'}, base_channels=16, output_channels=128, encoder_channels=((16), (32, 32, 32), (64, 64, 64), (64, 64, 64)), encoder_paddings=((1), (1, 1, 1), (1, 1, 1), ((0, 1, 1), 1, 1)), decoder_channels=((64, 64, 64), (64, 64, 32), (32, 32, 16), (16, 16, 16)), decoder_paddings=((1, 0), (1, 0), (0, 0), (0, 1)), init_cfg=None)[source]¶
SparseUNet for PartA^2.
See the paper for more details.
- Parameters
in_channels (int) – The number of input channels.
sparse_shape (list[int]) – The sparse shape of input tensor.
norm_cfg (dict) – Config of normalization layer.
base_channels (int) – Out channels for conv_input layer.
output_channels (int) – Out channels for conv_out layer.
encoder_channels (tuple[tuple[int]]) – Convolutional channels of each encode block.
encoder_paddings (tuple[tuple[int]]) – Paddings of each encode block.
decoder_channels (tuple[tuple[int]]) – Convolutional channels of each decode block.
decoder_paddings (tuple[tuple[int]]) – Paddings of each decode block.
- decoder_layer_forward(x_lateral, x_bottom, lateral_layer, merge_layer, upsample_layer)[source]¶
Forward of upsample and residual block.
- Parameters
x_lateral (
SparseConvTensor
) – Lateral tensor.x_bottom (
SparseConvTensor
) – Feature from bottom layer.lateral_layer (SparseBasicBlock) – Convolution for lateral tensor.
merge_layer (SparseSequential) – Convolution for merging features.
upsample_layer (SparseSequential) – Convolution for upsampling.
- Returns
Upsampled feature.
- Return type
SparseConvTensor
- forward(voxel_features, coors, batch_size)[source]¶
Forward of SparseUNet.
- Parameters
voxel_features (torch.float32) – Voxel features in shape [N, C].
coors (torch.int32) – Coordinates in shape [N, 4], the columns in the order of (batch_idx, z_idx, y_idx, x_idx).
batch_size (int) – Batch size.
- Returns
Backbone features.
- Return type
dict[str, torch.Tensor]
- make_decoder_layers(make_block, norm_cfg, in_channels)[source]¶
make decoder layers using sparse convs.
- Parameters
make_block (method) – A bounded function to build blocks.
norm_cfg (dict[str]) – Config of normalization layer.
in_channels (int) – The number of encoder input channels.
- Returns
The number of encoder output channels.
- Return type
int
- make_encoder_layers(make_block, norm_cfg, in_channels)[source]¶
make encoder layers using sparse convs.
- Parameters
make_block (method) – A bounded function to build blocks.
norm_cfg (dict[str]) – Config of normalization layer.
in_channels (int) – The number of encoder input channels.
- Returns
The number of encoder output channels.
- Return type
int
model_utils¶
- class mmdet3d.models.model_utils.EdgeFusionModule(out_channels, feat_channels, kernel_size=3, act_cfg={'type': 'ReLU'}, norm_cfg={'type': 'BN1d'})[source]¶
Edge Fusion Module for feature map.
- Parameters
out_channels (int) – The number of output channels.
feat_channels (int) – The number of channels in feature map during edge feature fusion.
kernel_size (int, optional) – Kernel size of convolution. Default: 3.
act_cfg (dict, optional) – Config of activation. Default: dict(type=’ReLU’).
norm_cfg (dict, optional) – Config of normalization. Default: dict(type=’BN1d’)).
- forward(features, fused_features, edge_indices, edge_lens, output_h, output_w)[source]¶
Forward pass.
- Parameters
features (torch.Tensor) – Different representative features for fusion.
fused_features (torch.Tensor) – Different representative features to be fused.
edge_indices (torch.Tensor) – Batch image edge indices.
edge_lens (list[int]) – List of edge length of each image.
output_h (int) – Height of output feature map.
output_w (int) – Width of output feature map.
- Returns
Fused feature maps.
- Return type
torch.Tensor
- class mmdet3d.models.model_utils.GroupFree3DMHA(embed_dims, num_heads, attn_drop=0.0, proj_drop=0.0, dropout_layer={'drop_prob': 0.0, 'type': 'DropOut'}, init_cfg=None, batch_first=False, **kwargs)[source]¶
A warpper for torch.nn.MultiheadAttention for GroupFree3D.
This module implements MultiheadAttention with identity connection, and positional encoding used in DETR is also passed as input.
- Parameters
embed_dims (int) – The embedding dimension.
num_heads (int) – Parallel attention heads. Same as nn.MultiheadAttention.
attn_drop (float, optional) – A Dropout layer on attn_output_weights. Defaults to 0.0.
proj_drop (float, optional) – A Dropout layer. Defaults to 0.0.
(obj (init_cfg) – ConfigDict, optional): The dropout_layer used when adding the shortcut.
(obj – mmcv.ConfigDict, optional): The Config for initialization. Default: None.
batch_first (bool, optional) – Key, Query and Value are shape of (batch, n, embed_dim) or (n, batch, embed_dim). Defaults to False.
- forward(query, key, value, identity, query_pos=None, key_pos=None, attn_mask=None, key_padding_mask=None, **kwargs)[source]¶
Forward function for GroupFree3DMHA.
**kwargs allow passing a more general data flow when combining with other operations in transformerlayer.
- Parameters
query (Tensor) – The input query with shape [num_queries, bs, embed_dims]. Same in nn.MultiheadAttention.forward.
key (Tensor) – The key tensor with shape [num_keys, bs, embed_dims]. Same in nn.MultiheadAttention.forward. If None, the
query
will be used.value (Tensor) – The value tensor with same shape as key. Same in nn.MultiheadAttention.forward. If None, the key will be used.
identity (Tensor) – This tensor, with the same shape as x, will be used for the identity link. If None, x will be used.
query_pos (Tensor, optional) – The positional encoding for query, with the same shape as x. Defaults to None. If not None, it will be added to x before forward function.
key_pos (Tensor, optional) – The positional encoding for key, with the same shape as key. Defaults to None. If not None, it will be added to key before forward function. If None, and query_pos has the same shape as key, then query_pos will be used for key_pos. Defaults to None.
attn_mask (Tensor, optional) – ByteTensor mask with shape [num_queries, num_keys]. Same in nn.MultiheadAttention.forward. Defaults to None.
key_padding_mask (Tensor, optional) – ByteTensor with shape [bs, num_keys]. Same in nn.MultiheadAttention.forward. Defaults to None.
- Returns
forwarded results with shape [num_queries, bs, embed_dims].
- Return type
Tensor
- class mmdet3d.models.model_utils.VoteModule(in_channels, vote_per_seed=1, gt_per_seed=3, num_points=- 1, conv_channels=(16, 16), conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, act_cfg={'type': 'ReLU'}, norm_feats=True, with_res_feat=True, vote_xyz_range=None, vote_loss=None)[source]¶
Vote module.
Generate votes from seed point features.
- Parameters
in_channels (int) – Number of channels of seed point features.
vote_per_seed (int, optional) – Number of votes generated from each seed point. Default: 1.
gt_per_seed (int, optional) – Number of ground truth votes generated from each seed point. Default: 3.
num_points (int, optional) – Number of points to be used for voting. Default: 1.
conv_channels (tuple[int], optional) – Out channels of vote generating convolution. Default: (16, 16).
conv_cfg (dict, optional) – Config of convolution. Default: dict(type=’Conv1d’).
norm_cfg (dict, optional) – Config of normalization. Default: dict(type=’BN1d’).
norm_feats (bool, optional) – Whether to normalize features. Default: True.
with_res_feat (bool, optional) – Whether to predict residual features. Default: True.
vote_xyz_range (list[float], optional) – The range of points translation. Default: None.
vote_loss (dict, optional) – Config of vote loss. Default: None.
- forward(seed_points, seed_feats)[source]¶
forward.
- Parameters
seed_points (torch.Tensor) – Coordinate of the seed points in shape (B, N, 3).
seed_feats (torch.Tensor) – Features of the seed points in shape (B, C, N).
- Returns
- vote_points: Voted xyz based on the seed points
with shape (B, M, 3),
M=num_seed*vote_per_seed
.
- vote_features: Voted features based on the seed points with
shape (B, C, M) where
M=num_seed*vote_per_seed
,C=vote_feature_dim
.
- Return type
tuple[torch.Tensor]
- get_loss(seed_points, vote_points, seed_indices, vote_targets_mask, vote_targets)[source]¶
Calculate loss of voting module.
- Parameters
seed_points (torch.Tensor) – Coordinate of the seed points.
vote_points (torch.Tensor) – Coordinate of the vote points.
seed_indices (torch.Tensor) – Indices of seed points in raw points.
vote_targets_mask (torch.Tensor) – Mask of valid vote targets.
vote_targets (torch.Tensor) – Targets of votes.
- Returns
Weighted vote loss.
- Return type
torch.Tensor