Surgan Jandial

Results 9 comments of Surgan Jandial

@bionicles i think it is explanatory except this https://github.com/pytorch/examples/blob/e0929a4253f9ae6ccdde24e787788a9955fdfe1c/dcgan/main.py#L232 might cause trouble

How about , putting the value of best_acc in the shared memory during multiprocessing .

@FedericOldani https://github.com/pytorch/pytorch/blob/44a607b90c9bba0cf268f833bae4715221346709/torch/jit/annotations.py#L33 Check this in the annotations.py of your pytorch version. Probably this is missing .

did you try increasing the num-workers ? maybe something like 16 ?

what is the batch size that u r using ?

I sort of had the same problem but increasing the batch size and num workers did the trick for me

i set the batch size to something around 500 and num_workers as 16

is this getting worked upon ?

convert_checkpoint.py for MPT is not synced with the latest llm_foundry mpt model.