NVP
NVP copied to clipboard
Official PyTorch implementation of Scalable Neural Video Representations with Learnable Positional Features (NeurIPS 2022).
Scalable Neural Video Representations with Learnable Positional Features (NVP)
Official PyTorch implementation of "Scalable Neural Video Representations with Learnable Positional Features" (NeurIPS 2022) by Subin Kim*1, Sihyun Yu*1, Jaeho Lee2, and Jinwoo Shin1.
1KAIST, 2POSTECH
TL;DR: We propose a novel neural representation for videos that is the best of both worlds; achieved high-quality encoding and the compute-/parameter- efficiency simultaneously.
Project Page | Paper
1. Requirements
Environments
Required packages are listed in environment.yaml
.
Also, you should install the following packages:
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
pip install git+https://github.com/subin-kim-cv/tiny-cuda-nn/#subdirectory=bindings/torch
- This repository of tiny-cuda-nn is slightly different from original implementation of tiny-cuda-nn.
Dataset
Download the UVG-HD dataset from the following link:
Then, extract RGB sequences from the original YUV videos of UVG-HD using ffmpeg. Here, INPUT
is the input file name, and OUTPUT
is a directory to save decompressed RGB frames.
ffmpeg -f rawvideo -vcodec rawvideo -s 1920x1080 -r 120 -pix_fmt yuv420p -i INPUT.yuv OUTPUT/f%05d.png
2. Training
Run the following script with a single GPU.
CUDA_VISIBLE_DEVICES=0 python experiment_scripts/train_video.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./config/config_nvp_s.json
- Option
--logging_root
denotes the path to save the experiment log. - Option
--experiment_name
denotes the subdirectory to save the log files (results, checkpoints, configuration, etc.) existed under--logging_root
. - Option
--dataset
denotes the path of RGB sequences (e.g.,~/data/Jockey
). - Option
--num_frames
denotes the number of frames to reconstruct (300 for the ShakeNDry video and 600 for other videos in UVG-HD). - To reconstruct videos with 300 frames, please change the values of
t_resolution
in configuration file to 300.
3. Evaluation
Evaluation without compression of parameters (i.e., qunatization only).
CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json
- Option
--save
denotes whether to save the reconstructed frames. - One can specify an option
--s_interp
for a video superresolution results. It denotes the superresolution scale (e.g., 8). - One can specify an option
--t_interp
for a video frame interpolation results. It denotes the temporal interpolation scale (e.g., 8).
Evaluation with compression of parameters using well-known image and video codecs.
-
Save the quantized parameters.
CUDA_VISIBLE_DEVICES=0 python experiment_scripts/compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json
-
Compress the saved sparse positional image-/video-like features using codecs.
- Execute
compression.ipynb
. - Please change the logging_root and experiment_name in
compression.ipynb
appropriately. - One can change
qscale
,crf
,framerate
which changes the compression ratio of sparse positinal features.-
qscale
ranges from 1 to 31, where larger values mean the worse quality (2~5 recommended). -
crf
ranges from 0 to 51 where larger values mean the worse quality (20~25 recommended). -
framerate
(25 or 40 recommended).
-
- Execute
-
Evaluation with the compressed parameters.
CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval_compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json --qscale 2 3 3 --crf 21 --framerate 25
- Option
--save
denotes whether to save the reconstructed frames. - Please specify the option
--qscale
,--crf
,--framerate
as same with the values in thecompression.ipynb
.
- Option
4. Results
Reconstructed video results of NVP on UVG-HD, and other 4K/long/temporally dynamic videos are available at the following project page.
Our model achieves the following performance on UVG-HD with a single NVIDIA V100 32GB GPU:
Encoding Time | BPP | PSNR (↑) | FLIP (↓) | LPIPS (↓) |
---|---|---|---|---|
~5 minutes | 0.901 | 34.57 $\pm$ 2.62 | 0.075 $\pm$ 0.021 | 0.190 $\pm$ 0.100 |
~10 minutes | 0.901 | 35.79 $\pm$ 2.31 | 0.065 $\pm$ 0.016 | 0.160 $\pm$ 0.098 |
~1 hour | 0.901 | 37.61 $\pm$ 2.20 | 0.052 $\pm$ 0.011 | 0.145 $\pm$ 0.106 |
~8 hours | 0.210 | 36.46 $\pm$ 2.18 | 0.067 $\pm$ 0.017 | 0.135 $\pm$ 0.083 |
- The reported values are averaged over the Beauty, Bosphorus, Honeybee, Jockey, ReadySetGo, ShakeNDry, and Yachtride videos in UVG-HD and measured using LPIPS, FLIP repositories.
One can download the pretrained checkpoints from the following link
Citation
@inproceedings{
kim2022scalable,
title={Scalable Neural Video Representations with Learnable Positional Features},
author={Kim, Subin and Yu, Sihyun and Lee, Jaeho and Shin, Jinwoo},
booktitle={Advances in Neural Information Processing Systems},
year={2022},
}
References
We used the code from following repositories: SIREN, Modulation, tiny-cuda-nn.