Task of 3D Deep Learning
- classification
- segmentation
- reconstruction
Representation
Form
- rasterized form (regular)
- geometric form (irregular)
Types
- multi-view
- volumetric
- part assembly
- point cloud
- mesh
- implicit (e.g. function space, occupancy network)
Datasets
- ShapeNet
- PartNet
- SceneNet
- ScanNet
- KITTI
Classification
Multi-view CNN
- 2D CNN + view pooling
- need projection
- noisy or incomplete input
Volumetric CNN
- memory complexity O($N^3$)
- remedy: Octree (1 node, 8 children)
- information loss
Point Network
- point cloud: unordered list
- PointNet: share-weights MLP for individual points
- no local context
- depends on absolute coordinate
- remedies:
- 3D kernels for convolution
- neighboring points
- ball
- KNN
- point conv as graph conv
- e.g. PointNet V2
Standard graph conv neural networks not geometry-aware
- points from surface
- features should be sample-invariant
- remedies:
- density estimation
- continuous kernel
- continuous conv
- remedies:
some other 3D point conv methods
- deformable point-based kernel
- tangent conv
- lattice
some other 3D conv methods
- spectral conv
- isometrics invariance (length-preserving)
- spherical CNN
- rotation invariance
Segmentation and Detection
- Encoder-decoder
- multi-modal (project 2D to 3D voxels fusion)
Top-down methods (point cloud contains objects)
- sliding shape
- view-based
- point-based
- proposal –> refinement
- proposal from views
- foreground/background segmentation
- 3D proposal from voting (surface–>centroid)
- Only the furthest point samples vote
Bottom-up methods
- points belong to the same object
- points similarity
- associative embedding (point embedding)
- BEV (Bird-eye-view)
- feature encoder
3D points are the intermediate media from camera to BEV.
Few-show/Zero-shot method
3D shapes/part merging
Reconstruction
- conditional generation
- free generation
metrics:
- Chamfer distance
- Earth mover’s distance
- F-score
- normal consistency
- light-field descriptor
Conditional generation
- Image –> volume
- Octree layer by layer
- Image –> point cloud
- point set distance
- Image –> shape
- Image –> surface
- deformable polygon
- part deformation
- modified topology
- implicit surface reconstruction
- function-surface
- input –>shape embedding
- reconstruct by decoding
- visually grounded prediction
- 2.5D to bridge
- structured predictions: part-based
- hierarchical graph
Free generation (GAN)
2 sets of point cloud
- metrics: Chamfer or earth mover distance
- convergence: fractions of shapes in B matched in A
- perceptually correct
- Feature distribution distance (e.g. Frechet Point Cloud Distance)
volumetric generation
point cloud generation
- FC layer generation
- PointNet as discriminator
- no high quality local details
- strong classifier is tricky
multi-view stereo
- Voxel occupancy predictions
- Learning-based stereopsis
Reference
PREVIOUSLinux