kaggle-camera-model-identification
kaggle-camera-model-identification copied to clipboard
Code for reproducing 2nd place solution for Kaggle competition IEEE's Signal Processing Society - Camera Model Identification
Kaggle IEEE's Signal Processing Society - Camera Model Identification
Implementation of camera model identification system by team "[ods.ai] GPU_muscles" (2nd place overall in Kaggle competition IEEE's Signal Processing Society - Camera Model Identification and 1st place among student eligible teams).
Should any questions arise regarding the solution, please do not hesitate to contact me
on Telegram or via e-mail [email protected]
Our team
- Artur Kuzin [linkedin]
- Valeriy Babushkin [linkedin]
- Artur Fattakhov [kaggle]
- Ilya Kibardin [linkedin]
- Andrey Kiselev [linkedin]
Requirements
To train models and get predictions the following is required:
- OS: Ubuntu 16.04
- Python 3.6
- Hardware:
- Any decent modern computer with x86-64 CPU,
- 32 GB RAM
- 4 x Nvidia GeForce GTX 1080 Ti
Installation
- Install required OS and Python
- Install packages with
pip install -r requirements.txt - Create
datafolder at the root of the repository. Place train dataset from Kaggle competition todata/train. Place test dataset from Kaggle competition todata/test. Place additional validation images todata/val_images. - Place
se_resnet50.pthandse_resnext50.pthtoimagenet_pretrainfolder. - Place the following final weights to
final_weightsfolder:densenet161_28_0.08377413648371115.pthdensenet161_55_0.08159203971706519.pthdensenet161_45_0.0813179751742137.pthdpn92_tune_11_0.1398952918197271.pthdpn92_tune_23_0.12260739478774665.pthdpn92_tune_29_0.14363511492280367.pth
Producing the final submission
Run bash final_submit.sh -d <folder with test images> -o <output .csv filename>
Training ensemble from scratch
This section describes the steps required to train our ensemble.
1. Download external dataset
Images from both Yandex.Fotki and Flickr are essential for reproducing our solution.
Downloading images from Yandex.Fotki
Run bash download_from_yandex.sh
Downloading images from Flickr
Unfortunately, this step involves some manual actions.
- cd into
downloader/flickr - For every model go to the telephone model group page from
flickr_groups.txt. Scroll every gallery page to the end and download as html file to the corresponding folder. As a result you will have a set of folders with .html files corresponding to a specific phone model athtml_pagesfolder. - Run
python pages_to_image_links.py. The result of the script will be folderlinksof .csv files with links to photos of each phone model. - Run
python download_from_links.pyto download images from the links received in the previous paragraph (previous two steps could be skipped, because thelinksfolder already contains necessary files).
2. Filter external dataset
Run bash filter.sh
3. Train the ensemble
- Download and filter external dataset as described above.
- Run
bash init_train.shto train 9 models. - Run
bash make_pseudo.shto get predictions from these models for images atdata/testand create pseudo labels. - Run
bash final_train.shto train the same 9 models but using pseudo labels this time. - Run
bash predict.sh -d <folder with test images> -o <output .csv filename>to get predictions from the ensemble.