dice-embeddings icon indicating copy to clipboard operation
dice-embeddings copied to clipboard

Using tqdm in the cpu train setup

Open Demirrr opened this issue 1 year ago • 7 comments

If --trainer torchCPUTrainer, https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/scripts/run.py#L56 then TorchTrainer is initialized to train an embedding model. https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/trainer/torch_trainer.py#L9

Yet, TorchTrainer does not use a progress bar. We should integrate https://github.com/tqdm/tqdm into the TorchTrainer that is being initialized. To this end, we should focus on this particular for loop (guess) https://github.com/dice-group/dice-embeddings/blob/0849d2c46ea4bf06d07db6f3d0d2ad01fc230af5/dicee/trainer/torch_trainer.py#L75

Demirrr avatar Aug 15 '24 07:08 Demirrr

We need a description tag for the progress bar : Is "Training progress" a suitable description for the progress bar?

sapkotaruz11 avatar Aug 15 '24 07:08 sapkotaruz11

Please train a model with PL trainer, to see the displayed info in the progress bar. Ideally, we should use the same description tag. However, it is not a must

Demirrr avatar Aug 15 '24 08:08 Demirrr

As per the PL trainer, they do not seem to be using a description tag. The way progress bar will be shown for the two trainers is also different, PL trainer shows a single persistent progress bar whereas TorchCPUTrainer shows the logs for each epoch and batch separately. PL trainer also uses the TQDM progress bar but they have customized it.

sapkotaruz11 avatar Aug 15 '24 09:08 sapkotaruz11

Alrighty. Perhaps for the time being, we can settle for two different TQDM progress bars for different trainers

Demirrr avatar Aug 15 '24 10:08 Demirrr

The major issue here with TorchCPUTrainer is that the logs ( epoch no, batch no, batch loss etc) are printed after execution of each batch of each epochs, which forces the tqdm progress bar to move to the new line. As much as I have gathered, the PL Trainer doesn't print the logs, rather it just posts the epoch no and a loss item as prefix and post-fix item within the progress bar. If we want to print the batch logs after each batch then we will have the progress bar printed again after each batch logs in a new line.

sapkotaruz11 avatar Aug 15 '24 12:08 sapkotaruz11

No worries. Could you please create a feature branch from dev branch and push the changes? So that I can take a look at it :)

Demirrr avatar Aug 15 '24 12:08 Demirrr

Can you check at https://github.com/dice-group/dice-embeddings/tree/tqdm-support . For now I have just added the progress bar to the loop running all the batch and epochs.

sapkotaruz11 avatar Aug 15 '24 13:08 sapkotaruz11

Completed with https://github.com/dice-group/dice-embeddings/pull/258

Demirrr avatar Aug 26 '24 18:08 Demirrr