Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis. However, they require the images of a scene from different camera views to be available for one-time training. This is expensive, especially for scenarios with large-scale scenes and limited data storage. In view of this, we explore the task of incremental learning for NIRs in this work. We design a student-teacher framework to mitigate the catastrophic forgetting problem. Specifically, we iterate the process of using the student as the teacher at the end of each time step and let the teacher guide the training of the student in the next step. As a result, the student network is able to learn new information from the streaming data and retain old knowledge from the teacher network simultaneously. Although intuitive, naively applying the student-teacher pipeline does not work well in our task. Not all information from the teacher network is helpful since it is only trained with the old data. To alleviate this problem, we further introduce a random inquirer and an uncertainty-based filter to filter useful information. Our proposed method is general and thus can be adapted to different implicit representations such as neural radiance field (NeRF) and neural SDF. Extensive experimental results for both 3D reconstruction and novel view synthesis demonstrate the effectiveness of our approach compared to different baselines.
The overall framework of our proposed student-teacher pipeline. At time step $t$, The student network learns simultaneously from the currently available data $\mathcal{D}^t$ and the previously learned knowledge from the teacher network. The input of the teacher network is generated with the random inquirer. The output is filtered with an uncertainty based filter for useful information selection. $V$ denotes the differentiable volume renderer.
Qualitative comparison on the ICL-NUIM and Replica datasets. Both 'MonoSDF' and 'Ours' models are incrementally trained on the 10-step training datasets. The red boxes are the previously learned views.
Qualitative comparison on the ScanNet and 360Capture datasets. 'NeRF' and 'Ours' models are incrementally trained on the 10-step training datasets. $\mathcal{D}^0,\mathcal{D}^3,\mathcal{D}^6$ denote the results of previous views from each time step test datasets and $\mathcal{D}^9$ is the results of current views from the latest test dataset.
@article{Guo2024UNIKD,
author = {Guo, Mengqi and Li, Chen and Chen, Hanlin and Lee, Gim Hee},
title = {UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation},
journal = {European Conference on Computer Vision (ECCV)},
year = {2024},
}