Takeaways of 3D Deep Learning Tutorial Keynote

 

Task of 3D Deep Learning

  • classification
  • segmentation
  • reconstruction

Representation

Form

  • rasterized form (regular)
  • geometric form (irregular)

Types

  • multi-view
  • volumetric
  • part assembly
  • point cloud
  • mesh
  • implicit (e.g. function space, occupancy network)

Datasets

  • ShapeNet
  • PartNet
  • SceneNet
  • ScanNet
  • KITTI

Classification

Multi-view CNN

  • 2D CNN + view pooling
  • need projection
  • noisy or incomplete input

Volumetric CNN

  • memory complexity O($N^3$)
    • remedy: Octree (1 node, 8 children)
  • information loss

Point Network

  • point cloud: unordered list
  • PointNet: share-weights MLP for individual points
    • no local context
    • depends on absolute coordinate
  • remedies:
    • 3D kernels for convolution
    • neighboring points
      • ball
      • KNN
    • point conv as graph conv
      • e.g. PointNet V2

Standard graph conv neural networks not geometry-aware

  • points from surface
  • features should be sample-invariant
    • remedies:
      • density estimation
      • continuous kernel
      • continuous conv

some other 3D point conv methods

  • deformable point-based kernel
  • tangent conv
  • lattice

some other 3D conv methods

  • spectral conv
    • isometrics invariance (length-preserving)
  • spherical CNN
    • rotation invariance

Segmentation and Detection

  • Encoder-decoder
  • multi-modal (project 2D to 3D voxels fusion)

Top-down methods (point cloud contains objects)

  1. sliding shape
  2. view-based
  3. point-based
  • proposal –> refinement
  • proposal from views
  • foreground/background segmentation
  • 3D proposal from voting (surface–>centroid)
  • Only the furthest point samples vote

Bottom-up methods

  • points belong to the same object
  • points similarity
  1. associative embedding (point embedding)
  2. BEV (Bird-eye-view)
  3. feature encoder

3D points are the intermediate media from camera to BEV.

Few-show/Zero-shot method

3D shapes/part merging

Reconstruction

  • conditional generation
  • free generation

metrics:

  • Chamfer distance
  • Earth mover’s distance
  • F-score
  • normal consistency
  • light-field descriptor

Conditional generation

  • Image –> volume
    • Octree layer by layer
  • Image –> point cloud
    • point set distance
  • Image –> shape
  • Image –> surface
    • deformable polygon
    • part deformation
    • modified topology
  • implicit surface reconstruction
    • function-surface
    • input –>shape embedding
    • reconstruct by decoding
  • visually grounded prediction
    • 2.5D to bridge
  • structured predictions: part-based
    • hierarchical graph

Free generation (GAN)

2 sets of point cloud

  • metrics: Chamfer or earth mover distance
  • convergence: fractions of shapes in B matched in A
  • perceptually correct
    • Feature distribution distance (e.g. Frechet Point Cloud Distance)

volumetric generation

point cloud generation

  • FC layer generation
  • PointNet as discriminator
  • no high quality local details
  • strong classifier is tricky

    multi-view stereo

  • Voxel occupancy predictions
  • Learning-based stereopsis

Reference

3D deep learning tutorial (Hao Su et al.)