Neural Radiance Field (NeRF) approaches learn the un- derlying 3D representation of a scene and generate photo- realistic novel views with high fidelity. However, most pro- posed settings concentrate on modelling a single object or a single level of a scene. However, in the real world, we may capture a scene at multiple levels, resulting in a lay- ered capture. For example, tourists usually capture a mon- ument’s exterior structure before capturing the inner struc- ture. Modelling such scenes in 3D with seamless switch- ing between levels can drastically improve immersive ex- periences. However, most existing techniques struggle in modelling such scenes. We propose Strata-NeRF, a single neural radiance field that implicitly captures a scene with multiple levels. Strata-NeRF achieves this by condition- ing the NeRFs on Vector Quantized (VQ) latent represen- tations which allow sudden changes in scene structure. We evaluate the effectiveness of our approach in multi-layered synthetic dataset comprising diverse scenes and then fur- ther validate its generalization on the real-world RealEstate 10k dataset. We find that Strata-NeRF effectively captures stratified scenes, minimizes artifacts, and synthesizes high- fidelity views compared to existing approaches.
| Mip-NeRF 360 | Strata-NeRF | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
| 
 Cube-Sphere-Monkey  | 
| 
 Level 0 | 
| 
 Level 1 | 
| 
 Level 2 | 
| 
 Dragon in Pyramid  | 
| 
 Level 0 | 
 Level 1  | 
| 
 Buddhist Temple  | 
 Level 0  | 
| 
 Level 1  | 
| 
 Level 2 | 
| 
 Coffee Shop  | 
| 
 Level 0 | 
| 
 Level 1  | 
| 
 Level 2 | 
This work was supported by Samsung R&D Institute India, Bangalore, PMRF and Kotak IISc AIML Centre (KIAC). Srinath Sridhar was partly supported by NSF grant CNS-2038897