ConSSL icon indicating copy to clipboard operation
ConSSL copied to clipboard

Code improvements

Open vfdev-5 opened this issue 4 years ago • 3 comments

Hi @sally20921

As discussed before, I create an issue here to simplify the code:

  1. Metrics. In Ignite we already have all necessary metrics. Please, take a look here:
  • https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/references/classification/imagenet/code/scripts/training.py#L160-L163
  • https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/contrib/cifar10/main.py#L77-L80

Thus, this part of the code can be simplified (no need to introduce StatMetric): https://github.com/sally20921/BYOL/blob/455e15e9c9375f8cb99fe133860dc91838092440/code/train.py#L29-L32

Another point is that, computing metrics during the training, i.e. model is updated on each batch, can be misleading as metric is averaged on predictions of different models. Please, see also here the footnote : https://pytorch.org/ignite/quickstart.html#id1

  1. Code structure. It would be nice to simplify the codebase structure and have maximum 4-5 files in the repository:
  • train.py = main script to train a model
  • evaluate.py = main script to run evaluation of a trained model
  • utils.py = various helper methods used by train.py and evaluate.py
  • dataflow.py = helper module to setup training/validation/test dataloaders
  • (optional) models.py = implementation of the models if necessary
  • (optional) losses.py = implementation of loss functions if necessary

what do you think ?

  1. Distributed training.

Using the latest API it is possible to write a code that can run on any number of devices.

vfdev-5 avatar Jan 15 '21 10:01 vfdev-5

Hello! Hope you are having a nice day~ I just wanted to mention I am almost done with the BYOL implementation (accidentally SimSiam as well). I just need to change a few lines to make it distributed. It would be wonderful if I could test my code on the gpu server! I will be waiting to hear from you. Bye~

Seri Lee

On Jan 15, 2021, at 7:21 PM, vfdev [email protected] wrote:

Hi @sally20921 https://github.com/sally20921 As discussed before, I create an issue here to simplify the code:

Metrics. In Ignite we already have all necessary metrics. Please, take a look here: https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/references/classification/imagenet/code/scripts/training.py#L160-L163 https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/references/classification/imagenet/code/scripts/training.py#L160-L163 https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/contrib/cifar10/main.py#L77-L80 https://github.com/pytorch/ignite/blob/55d8cd84f7b823559ec8b4c565814f6f4ad2bff6/examples/contrib/cifar10/main.py#L77-L80 Thus, this part of the code can be simplified (no need to introduce StatMetric): https://github.com/sally20921/BYOL/blob/455e15e9c9375f8cb99fe133860dc91838092440/code/train.py#L29-L32 https://github.com/sally20921/BYOL/blob/455e15e9c9375f8cb99fe133860dc91838092440/code/train.py#L29-L32 Another point is that, computing metrics during the training, i.e. model is updated on each batch, can be misleading as metric is averaged on predictions of different models. Please, see also here the footnote : https://pytorch.org/ignite/quickstart.html#id1 https://pytorch.org/ignite/quickstart.html#id1 Code structure. It would be nice to simplify the codebase structure and have maximum 4-5 files in the repository: train.py = main script to train a model evaluate.py = main script to run evaluation of a trained model utils.py = various helper methods used by train.py and evaluate.py dataflow.py = helper module to setup training/validation/test dataloaders (optional) models.py = implementation of the models if necessary (optional) losses.py = implementation of loss functions if necessary what do you think ?

Distributed training. Using the latest API https://pytorch.org/ignite/distributed.html#distributed-launcher-and-auto-helpers it is possible to write a code that can run on any number of devices.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sally20921/BYOL/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJEC5CGGOR7ANKAV2WGBPFLS2AJMNANCNFSM4WD2YPMQ.

sally20921 avatar Jan 26 '21 11:01 sally20921

Thanks for the update @sally20921 !

I would like to test the code on my side. Which dataset you are using for debuging and which datasets you think we'll need for checking implementation correctness ? I suppose that we'll need ImageNet-1k for a large scale pretraining as they do in the paper...

I'm still hesitating about the repository's structure and unfortunately find it very complicated to understand what is done and how it is implemented. Would you like to improve it ?

vfdev-5 avatar Jan 26 '21 22:01 vfdev-5

I have implemented CIFAR10, CIFAR100, STL10, ImageNet so you can test using any of the mentioned above! I will try to improve the code like you mentioned on the issue=) I will be contacting you soon😀

sally20921 avatar Jan 26 '21 22:01 sally20921