ConSSL
ConSSL copied to clipboard
Code improvements
Hi @sally20921
As discussed before, I create an issue here to simplify the code:
- Metrics. In Ignite we already have all necessary metrics. Please, take a look here:
- https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/references/classification/imagenet/code/scripts/training.py#L160-L163
- https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/contrib/cifar10/main.py#L77-L80
Thus, this part of the code can be simplified (no need to introduce StatMetric
):
https://github.com/sally20921/BYOL/blob/455e15e9c9375f8cb99fe133860dc91838092440/code/train.py#L29-L32
Another point is that, computing metrics during the training, i.e. model is updated on each batch, can be misleading as metric is averaged on predictions of different models. Please, see also here the footnote : https://pytorch.org/ignite/quickstart.html#id1
- Code structure. It would be nice to simplify the codebase structure and have maximum 4-5 files in the repository:
- train.py = main script to train a model
- evaluate.py = main script to run evaluation of a trained model
- utils.py = various helper methods used by train.py and evaluate.py
- dataflow.py = helper module to setup training/validation/test dataloaders
- (optional) models.py = implementation of the models if necessary
- (optional) losses.py = implementation of loss functions if necessary
what do you think ?
- Distributed training.
Using the latest API it is possible to write a code that can run on any number of devices.
Hello! Hope you are having a nice day~ I just wanted to mention I am almost done with the BYOL implementation (accidentally SimSiam as well). I just need to change a few lines to make it distributed. It would be wonderful if I could test my code on the gpu server! I will be waiting to hear from you. Bye~
Seri Lee
On Jan 15, 2021, at 7:21 PM, vfdev [email protected] wrote:
Hi @sally20921 https://github.com/sally20921 As discussed before, I create an issue here to simplify the code:
Metrics. In Ignite we already have all necessary metrics. Please, take a look here: https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/references/classification/imagenet/code/scripts/training.py#L160-L163 https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/references/classification/imagenet/code/scripts/training.py#L160-L163 https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/contrib/cifar10/main.py#L77-L80 https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/contrib/cifar10/main.py#L77-L80 Thus, this part of the code can be simplified (no need to introduce StatMetric): https://github.com/sally20921/BYOL/blob/455e15e9c9375f8cb99fe133860dc91838092440/code/train.py#L29-L32 https://github.com/sally20921/BYOL/blob/455e15e9c9375f8cb99fe133860dc91838092440/code/train.py#L29-L32 Another point is that, computing metrics during the training, i.e. model is updated on each batch, can be misleading as metric is averaged on predictions of different models. Please, see also here the footnote : https://pytorch.org/ignite/quickstart.html#id1 https://pytorch.org/ignite/quickstart.html#id1 Code structure. It would be nice to simplify the codebase structure and have maximum 4-5 files in the repository: train.py = main script to train a model evaluate.py = main script to run evaluation of a trained model utils.py = various helper methods used by train.py and evaluate.py dataflow.py = helper module to setup training/validation/test dataloaders (optional) models.py = implementation of the models if necessary (optional) losses.py = implementation of loss functions if necessary what do you think ?
Distributed training. Using the latest API https://pytorch.org/ignite/distributed.html#distributed-launcher-and-auto-helpers it is possible to write a code that can run on any number of devices.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sally20921/BYOL/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJEC5CGGOR7ANKAV2WGBPFLS2AJMNANCNFSM4WD2YPMQ.
Thanks for the update @sally20921 !
I would like to test the code on my side. Which dataset you are using for debuging and which datasets you think we'll need for checking implementation correctness ? I suppose that we'll need ImageNet-1k for a large scale pretraining as they do in the paper...
I'm still hesitating about the repository's structure and unfortunately find it very complicated to understand what is done and how it is implemented. Would you like to improve it ?
I have implemented CIFAR10, CIFAR100, STL10, ImageNet so you can test using any of the mentioned above! I will try to improve the code like you mentioned on the issue=) I will be contacting you soon😀