Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks

Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar





(a) (b) (c)

Segmentation

(a) Input viewpoints with the 2D segmentation masks. (b) The result after extracting the segmented regions. (c) The result after deleting the segmented regions. Note the see through effect achieved by 3D segmentation.

(a) (b) (c)

Few-Shot Affordance Transfer

(a) Some annotated source images. (b) 2D-2D Affordance Transfer. (c) 2D-3D Affordance Transfer. Note that the source images are different but in the same category as the target frames.

Abstract

In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we found that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics.


Theory

Consider the color \(C\) of a pixel at \((x, y)\) in a 3DGS rendering,

\begin{align} C &= \sum_{n\leq N} c_n \alpha_n \prod_{m < n} ( 1 - \alpha_m) \label{eqn:3dgs-alpha-blending} \\ &=\sum_{n\leq N} c_n \alpha_n T_n \end{align}

Where \(N\) is the total number of Gaussians, each indexed by its sorted position, \(c_n\) is the color associated with the \(n\)th Gaussian, \(\alpha_n\) is the opacity of the \(n\)th Gaussian at \((x,y)\) adjusted with exponential falloff, and \(T_n=\prod_{m < n} ( 1 - \alpha_m)\) is the transmittance of \(n\)th Gaussian at \((x,y)\).

Taking the derivating with respect to color of \(k\)th Gaussian \(c_k\),

\begin{align} \frac{ \partial{C}} {\partial{c_k}} = \alpha_k T_k \label{eqn:gradient} \end{align}

This derivative is zero only if either the transmittance \(T_k\) or opacity \(\alpha_k\) is zero, indicating that the Gaussian does not contribute to the final color. That means non-zero gradient indicates that the Gaussian influences the pixel color.

We can use this fact for segmenting the 3DGS by filtering the gradients with 2D masks. The gradients are used as votes to achieve accurate segmentation.


Pseudocode


def get_3d_mask(gaussians, viewpoints, masks):
    accumulated_grads = [0] * len(gaussians)

    for camera_params, mask_2d in  zip(viewpoints, masks):

        # forward propagation in training mode (but during inference)
        frame = rasterize(gaussians, camera_params) 
        
        # Gradient backpropagation
        accumulated_grads += get_masked_gradients(gaussians, frame, mask_2d) 
        accumulated_grads -= get_masked_gradients(gaussians, frame, ~mask_2d)
    mask_3d = accumulated_grads > 0
    return mask_3d

Timing Analysis

Time Taken (per scene)
Segmentation (Including SAM 2 Mask Generation, Using All Masks) 27.72 s
Segmentation (Using All Masks) 3.35 s
Segmentation (Using 10% of the Masks) 0.35 s
Pruning/Compression 3.68 s

Ignoring mask generation latency, the overall processing time is proportional to the number of frames, and each frame is processed at the training rate during inference. Performance is measured on an NVIDIA A6000 GPU.


Gallery

Sample frame with mask Extraction Deletion

Example Downstream Applications

Augmented Reality

Reorganizing Objects in Real-Time


Citation

If you find this paper or the code helpful for your work, please consider citing the following preprint:


@article{joji2024gradient,
  title={Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks},
  author={Joji Joseph and Bharadwaj Amrutur and Shalabh Bhatnagar},
  journal={arXiv preprint arXiv:2409.11681},
  year={2024},
  url={https://arxiv.org/abs/2409.11681}
}