Few-Shot-Patch-Based-Training
Few-Shot-Patch-Based-Training copied to clipboard
a fork implementation of SIGGRAPH 2020 paper Interactive Video Stylization Using Few-Shot Patch-Based Training
this implementation works on python 3.9 and puts all the scripts into one script
[x] streamline everything of Few-Shot-Patch-Based-Training into one script
[x] automatically get frames to apply style on with --framegap
[x] gif support
[x] make a GUI
[x] mask support
[] optimize the movement tracking scripts to run with GPU or multiple CPU (in progress)
[] add linux support (in progress)
[] support videos over 1000 frames (in progress)
why this fork exists
the original repo was hard to comprehend and required a lot of work to start, my goal with this repo is to make it as automized as possible.
to run this script
run terminal as administrator
cd C:/path/to/Few-Shot-Patch-Based-Training-master
python _tools\fewshot_UI.py
the terminal will pause after processing the frames and folders, you can then take the frames from the folder it tells you to take them from and apply a style to those and then export the frames to the folder it tells you, then press enter a couple times to resume the script :)
if you want to process a video but it has scene changes I recommend this tool for splitting the video into pieces
http://scenedetect.com/en/latest/
install guide
make sure there's no spaces in the directories that lead to the Few-Shot-Patch-Based-Training-master folder
Download prebuilt OpenCV-4.2.0 for windows,
put the opencv-4.2.0 folder in Few-Shot-Patch-Based-Training-master\_tools\disflow
As it links against OpenCV-4.2.0,
it expects Few-Shot-Patch-Based-Training-master\\_tools\disflow\opencv-4.2.0\bin in PATH.
put these into PATH system variables
put "C:\path\to\Few-Shot-Patch-Based-Training-master\_tools\disflow" in PATH
put "C:\path\to\Few-Shot-Patch-Based-Training-master\_tools\gauss" in PATH
put "C:\path\to\Few-Shot-Patch-Based-Training-master\_tools\bilateralAdv" in PATH
put "C:\path\to\Few-Shot-Patch-Based-Training-master\_tools\disflow\opencv-4.2.0\bin" in PATH
pip installs
(venv should work now thanks to alpkabac)
pip install ruamel.yaml
pip install pysimplegui
pip install Gooey
pip install opencv-python
pip install scikit-build
pip install cython
pip install Pillow
pip install PyYAML==5.4
pip install scikit-image==0.18.1
pip install scipy==1.6.2
pip install tensorflow==2.7.0
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install numpy==1.21.2
pip install moviepy
pip install numba
imagemagick install
Download the latest 64 bit HDR version of Image Magick from here
BTW works best if your original footage is super clean (no flickering etc)
just run the command, the guide on how to do it will be displayed in the command terminal after you run the script (you'll see)
Interactive Video Stylization Using Few-Shot Patch-Based Training
The official implementation of
Interactive Video Stylization Using Few-Shot Patch-Based Training O. Texler, D. Futschik, M. Kučera, O. Jamriška, Š. Sochorová, M. Chai, S. Tulyakov, and D. Sýkora [
WebPage], [Paper], [BiBTeX]

Run
Download the testing-data.zip, and unzip. The _train folder is expected to be next to the _gen folder.
Pre-Trained Models
If you want just quickly test the network, here are some
pre-trained-models.zip.
Unzip, and follow with the Generate step. Be sure to set the correct --checkpoint path
when calling generate.py, e.g., _pre-trained-models/Zuzka2/model_00020.pth.
Train
To train the network, run the train.py
See the example command below:
train.py --config "_config/reference_P.yaml"
--data_root "Zuzka2_train"
--log_interval 1000
--log_folder logs_reference_P
Every 1000 (log_interval) epochs, train.py saves the current generator to
logs_reference_P (log_folder), and it validates/runs the generator on _gen data - the
result is saved in Zuzka2_gen/res__P
Generate
To generate the results, run generate.py.
generate.py --checkpoint "Zuzka2_train/logs_reference_P/model_00020.pth"
--data_root "Zuzka2_gen"
--dir_input "input_filtered"
--outdir "Zuzka2_gen/res_00020"
--device "cuda:0"
To generate the results on live webcam footage, run generate_webcam.py. To stop the generation, press q while the preview window is active.
generate_webcam.py --checkpoint "Zuzka2_train/logs_reference_P/model_00020.pth"
--device "cuda:0"
--resolution 1280 720
--show_original 1
--resize 256
An optional resolution argument has been added, but the images will be always cropped to square, and resized to the size of resize x resize for shorter delay.
Installation
Tested on Windows 10, Python 3.7.8, CUDA 10.2.
With the following python packages:
numpy 1.19.1
opencv-python 4.4.0.40
Pillow 7.2.0
PyYAML 5.3.1
scikit-image 0.17.2
scipy 1.5.2
tensorflow 1.15.3 (tensorflow is used only in the logger.py, I will remove this not-necessary dependency soon)
torch 1.6.0
torchvision 0.7.0
Temporal Consistency [Optional]
This section is optional. It describes steps that can help to maintain temporal coherency of the resulting video sequence. All example commands and build scripts in this section assume Windows; however, it should be really straightforward to build it and run it on Linux/MacOS.
As the temporal consistency in our technique is not explicitly enforced, it gives us many advantages, e.g., parallel processing, fast training, etc., but the resulting stylized sequence may contain disturbing amount of flickering. While temporal consistency can be caused by various factors, below, we discuss how to deal with two most crucial of them.
Noise in the Input Sequence
The input video sequence captured by a camera usually contains some amount of temporal noise. While this noise might not be visible by the naked eye or might seem negligible, the network tends to amplify it. To deal with this issue, we propose to filter the input sequence using time-aware bilateral filter.
First, optical flow has to be computed. Use the optical flow tool in _tools/disflow.
See section Build disflow below on how to build the tool.
Once disflow.exe is built and present in the PATH, see and modify the first few lines of
_tools/tool_disflow.py, and run it. It reads PNGs from the input folder and stores optical flow in
flow_fwd and flow_bwd folder.
Once, the optical flow is computed, use time-aware bilateral filter tool _tools/bilateralAdv to
filter the sequence. See section Build bilateralAdv below on how to build the tool.
Once bilateralAdv.exe is built and present in the PATH, see and modify the first few lines of
_tools/tool_bilateralAdv.py, and run it. It reads PNGs from the input folder, and optical flow data
from the flow_fwd and flow_bwd; it stores filtered sequence in input_filtered.
Note, feel free to parallelize the for loop in _tools/tool_bilateralAdv.py, bilateralAdv.exe
uses optical flow and can be run frame by frame independently. Also, feel free to optimize
bilateralAdv.exe so that is uses multiple CPU-cores or even a GPU ... I am thrilled to see
your pull request :-)
Finally, to do the training and inference, use filtered input_sequence images instead
of the original noisy input images. Hopefully, the results will be more stable in time.
Ambiguity in the Training Data
As the network is trained on small, by default 32x32 px patches, it is likely that multiple 32x32 px
patches from input RGB frame will be very similar. For instance, if there is sky in the background of input
image, patches from left and right part of the sky will likely be very similar. The problem is that in the
stylyzed exemplar, these patches might be stylized slightly differently. And that is the ambiguity,
multiple similar input patches will be, during the training, mapped to different stylized patches.
To deal with this, we propose to use an auxiliary RGB input images that will make all input
patches unique.
First, optical flow has to be computed. Use the optical flow tool in _tools/disflow.
See section Build disflow below on how to build the tool.
Once disflow.exe is built and present in the PATH, see and modify the first few lines of
_tools/tool_disflow.py, and run it. It reads PNGs from the input folder and stores optical flow in
flow_fwd and flow_bwd folder.
Once, the optical flow is computed, use _tools/gauss to compute auxiliary gaussian mixture images.
See section Build gauss below on how to build the tool.
Once gauss.exe is built and present in the PATH, see and modify the first few lines of
_tools/tool_gauss.py, and run it. It reads mask images from the mask folder
(these masks can but do not need to match the masks you use during training,
see the section below for more info), and optical flow data
from the flow_fwd and flow_bwd; it outputs two different gaussian mixtures in
input_gdisko_gauss_r10_s10 (smaller circles) and input_gdisko_gauss_r10_s15 (larger circles).
Pick one of them, e.g., input_gdisko_gauss_r10_s10, if it does not work well, try the other one.
Place the folder input_gdisko_gauss_r10_s10 next to your input folder in both _gen as
well as _train folder, in _train folder, the input_gdisko_gauss_r10_s10 will contain only
frames corresponding ot the stylized keyframes, e.g., 001.png for Maruska640 sequence or
000.png, 030.png, 070.png, and 103.png for Zuzka2 sequence. To train, do not forget to use
the correct config file, e.g., --config "_config/reference_P_disco1010.yaml" while running
train.py script. To run the inference generate.py script, use an optional
argument --dir_x1 input_gdisko_gauss_r10_s10 that will tell the generate.py
to load images from input_gdisko_gauss_r10_s10.
Masks for Gauss
While running the gauss.exe, the gaussian mixtures are generated for every mask image,
and are propagated to the sequence using optical flow, if there are multiple mask images
provided, the resulting gaussian circles will be stacked on top of each other (and they will cover
potential holes). The mask can (and in most cases will) be fully-white
images. If you are not sure what frames to pick as mask, pick the same as your keyframes
or/and first and last frame of the sequence. See the gaussian mixture results, e.g.,
input_gdisko_gauss_r10_s10, if there are large black holes (larger than 100x100 px),
add one more mask image for the frame where the black holes are the largest.
Build Temporal Consistency Tools
Build disflow
On Windows, try to use prebuilt disflow.exe. Otherwise, use _tools/disflow/build_win.bat
to build disflow.exe yourself (on Linux/MacOS, get inspired by
the build script, it should be really easy to build it). As it links against OpenCV-4.2.0,
it expects the opencv_world420.dll in PATH. Download OpenCV-4.2.0,
they offer prebuilt
Win pack.
Feel free to modify the build script to use a different version of OpenCV. Note, OpenCV includes are
provided and located at _tools\disflow\opencv-4.2.0\include, Windows .lib files are provided and
located at _tools\disflow\opencv-4.2.0\lib.
Build bilateralAdv
On Windows, try to use prebuilt bilateralAdv.exe. Otherwise, use _tools/bilateralAdv/build_win.bat
to build bilateralAdv.exe yourself (on Linux/MacOS, get inspired by
the build script, it should be really easy to build it).
Build gauss
On Windows, try to use prebuilt gauss.exe. Otherwise, use _tools/gauss/build_win.bat
to build gauss.exe yourself (on Linux/MacOS, get inspired by
the build script, it should be really easy to build it).
Other Implementations
- Thank you, Midas, for reimplementing this repo in PyTorch Lightning, see https://github.com/rnwzd/FSPBT-Image-Translation
Credits
- This project started when Ondrej Texler was an intern at Snap Inc., and it was funded by Snap Inc. and Czech Technical University in Prague
- The main engineering forces behind this repository are Ondrej Texler, David Futschik, and Michal Kučera.
- The main engineering forces behind temporal consistency tools are Ondrej Jamriska and Sarka Sochorova
License
- The Patch-Based Training method is not patented, and we do not plan on patenting.
- However, you should be aware that certain parts of the code in this repository were written when Ondrej Texler and David Futschik were employed by Snap Inc.. If you find this project useful for your commercial interests, please, reimplement it.
Citing
If you find Interactive Video Stylization Using Few-Shot Patch-Based Training useful for your research or work, please use the following BibTeX entry.
@Article{Texler20-SIG,
author = "Ond\v{r}ej Texler and David Futschik and Michal Ku\v{c}era and Ond\v{r}ej Jamri\v{s}ka and \v{S}\'{a}rka Sochorov\'{a} and Menglei Chai and Sergey Tulyakov and Daniel S\'{y}kora",
title = "Interactive Video Stylization Using Few-Shot Patch-Based Training",
journal = "ACM Transactions on Graphics",
volume = "39",
number = "4",
pages = "73",
year = "2020",
}