Jan Lasek

Results 9 comments of Jan Lasek

> There are also many odd choices, many torch.nn imports have been removed and arbitrarily using the longer torch.nn.xyz - why ? The suggestions come from Flake8 tool, I just...

> This will create merge conflicts in practically every open pr. Why is this PR necessary ? If it's aesthetics, it's not sufficient reason to merge this. > > @artbataev...

Is it fine & really required to introduce extra param `--input_tokenizer`? If yes, should we update relevant docs? https://github.com/NVIDIA/NeMo/blob/main/docs/source/ckpt_converters/dev_guide.rst CC @yaoyu-33 - please have a look, maybe you will have...

Hello, I have two questions first that hopefully will help us to figure this out more effectively: 1. Could you share the exact command you tried? 2. What are "global_batch_size"...

Regarding > I am not sure how to interpret the saved directory and how to read all the embedding tables from the output shown above. There are 26 categories /...

Right, the development and testing involved only A100. To achieve this at least you would need CUDA 12 and compile FBGEMM for Hopper architecture (SM90). But I have never tried...