GANESH: Generalizable NeRF for Lensless Imaging

1Shiv Nadar University Chennai, 2IIT Madras

WACV 2025

GANESH refines and renders 3D view of corrupted lensless captures

Abstract

Lensless imaging offers a significant opportunity to develop ultra-compact cameras by removing the conventional bulky lens system. However, without a focusing element, the sensor's output is no longer a direct image but a complex multiplexed scene representation.

Traditional methods have attempted to address this challenge by employing learnable inversions and refinement models, but these methods are primarily designed for 2D reconstruction and do not generalize well to 3D reconstruction. We introduce GANESH, a novel framework designed to enable simultaneous refinement and novel view synthesis from multi-view lensless images. Unlike existing methods that require scene-specific training, our approach supports on-the-fly inference without retraining on each scene. Moreover, our framework allows us to tune our model to specific scenes, enhancing the rendering and refinement quality. To facilitate research in this area, we also present the first multi-view lensless dataset, LenslessScenes. Extensive experiments demonstrate that our method outperforms current approaches in reconstruction accuracy and refinement quality. The code and dataset will be released upon acceptance.


Contributions

  • We present a novel framework that simultaneously achieves refinement and rendering of lensless captures.
  • Our approach is generalizable, i.e., it can render views on-the-fly without any need for scene-specific training.
  • We present \textit{LenslessScenes}, the first dataset of multi-view lensless captures.
  • Our experimental results demonstrate that the proposed method outperforms existing techniques that separately handle refinement and novel view synthesis.
Model Architecture

GANESH: Method

A novel framework to perform generalizable novel view synthesis from lensless captures. The task is to generate refined novel views from N calibrated multi-view lensless images of a scene, with known camera poses, while ensuring the model generalizes to unseen scenes. Our method builds upon existing GNT architecture but conditions the scene representation and rendering processes based on the captured multi-view lensless images. First, these lensless captures are passed through a simple Wiener deconvolution filter to obtain a coarse estimate of the scene. The deconvolved outputs of this filter are then passed on to a generalizable view synthesis model, which performs both refinement and rendering simultaneously. Such a pipeline can be trained end-to-end on synthetically generated scenes and directly transferred to any real scene without additional optimization.


Coarse Estimation

Given the global multiplexing of lensless captures, we cannot directly feed them into the radiance fields model to render novel views. Hence, to reconstruct the RGB image from the lensless captures, these need to be deconvolved with the lensless camera's point spread function (PSF) to obtain coarse reconstructed images. For this, we utilize wiener deconvolution, which accepts the lensless capture and the point spread function as the input and returns the reconstructed image.



Scene Specific Training Results


Scene


Generalized Training Results


Scene


Real-times inference Results


Scene

Supplementary

We first analyzed the influence of PSF structure by comparing the results of PSFs in the RGB scale with those of binary PSFs, where only values of 0 and 1 were used. The binary PSF was derived by converting the original grayscale PSF into binary values, with thresholding applied to assign the binary values. Interestingly, the outputs produced by the RGB-scale PSF closely resembled those from the binary PSF, indicating that color information in the PSF has minimal impact on image reconstruction. This observation highlights that the structural characteristics of the PSF play a more significant role in decoding the image than the color properties. Thus, focusing on the PSF's shape, rather than its color composition, may lead to more effective results in the context of 3D image reconstruction.


Scene

In addition to the PSF structure, we also studied the effects of PSF size. Smaller PSFs were generated by cropping the original PSF, and the resulting outputs were compared. These experiments provided insights into how varying PSF dimensions influence the quality of reconstructed images, highlighting importance of optimizing both PSF structure and size for improved lensless imaging performance. We cropped smaller samples from the original PSF, which had a size of 1518x2012. By examining PSFs of various dimensions, we observed notable differences in how image information was processed and reconstructed.


Scene


More Results


Scene


Scene