Takeaways of 3D Deep Learning Tutorial Keynote

Task of 3D Deep Learning

classification
segmentation
reconstruction

Representation

Form

rasterized form (regular)
geometric form (irregular)

Types

multi-view
volumetric
part assembly
point cloud
mesh
implicit (e.g. function space, occupancy network)

Datasets

ShapeNet
PartNet
SceneNet
ScanNet
KITTI

Classification

Multi-view CNN

2D CNN + view pooling
need projection
noisy or incomplete input

Volumetric CNN

memory complexity O($N^3$)
- remedy: Octree (1 node, 8 children)
information loss

Point Network

point cloud: unordered list
PointNet: share-weights MLP for individual points
- no local context
- depends on absolute coordinate
remedies:
- 3D kernels for convolution
- neighboring points
  - ball
  - KNN
- point conv as graph conv
  - e.g. PointNet V2

Standard graph conv neural networks not geometry-aware

points from surface
features should be sample-invariant
- remedies:
  - density estimation
  - continuous kernel
  - continuous conv

some other 3D point conv methods

deformable point-based kernel
tangent conv
lattice

some other 3D conv methods

spectral conv
- isometrics invariance (length-preserving)
spherical CNN
- rotation invariance

Segmentation and Detection

Encoder-decoder
multi-modal (project 2D to 3D voxels fusion)

Top-down methods (point cloud contains objects)

sliding shape
view-based
point-based

proposal –> refinement
proposal from views
foreground/background segmentation
3D proposal from voting (surface–>centroid)
Only the furthest point samples vote

Bottom-up methods

points belong to the same object
points similarity

associative embedding (point embedding)
BEV (Bird-eye-view)
feature encoder

3D points are the intermediate media from camera to BEV.

Few-show/Zero-shot method

3D shapes/part merging

Reconstruction

conditional generation
free generation

metrics:

Chamfer distance
Earth mover’s distance
F-score
normal consistency
light-field descriptor

Conditional generation

Image –> volume
- Octree layer by layer
Image –> point cloud
- point set distance
Image –> shape
Image –> surface
- deformable polygon
- part deformation
- modified topology
implicit surface reconstruction
- function-surface
- input –>shape embedding
- reconstruct by decoding
visually grounded prediction
- 2.5D to bridge
structured predictions: part-based
- hierarchical graph

Free generation (GAN)

2 sets of point cloud

metrics: Chamfer or earth mover distance
convergence: fractions of shapes in B matched in A
perceptually correct
- Feature distribution distance (e.g. Frechet Point Cloud Distance)

volumetric generation

point cloud generation

FC layer generation
PointNet as discriminator
no high quality local details
strong classifier is tricky
multi-view stereo
Voxel occupancy predictions
Learning-based stereopsis

Reference

3D deep learning tutorial (Hao Su et al.)

PREVIOUSLinux

Task of 3D Deep Learning

Representation

Form

Types

Datasets

Classification

Multi-view CNN

Volumetric CNN

Point Network

Standard graph conv neural networks not geometry-aware

some other 3D point conv methods

some other 3D conv methods

Segmentation and Detection

Top-down methods (point cloud contains objects)

Bottom-up methods

Few-show/Zero-shot method

Reconstruction

metrics:

Conditional generation

Free generation (GAN)

volumetric generation

point cloud generation

multi-view stereo

Reference