Stas Bekman

Results 664 comments of Stas Bekman

Thank you for validating that it is a good read, @pengzhangzhi

It probably depends on what's the most recommended way of running things. And it's in flux most of the time as ML evolves. Normally, torchrun (and previously torch.launch) was the...

amazing! Thank you, @lhoestq does it work with non-iterable dataset as well? the docs only mention iterable dataset

Thank you very much for clarifying that, Andrew.

@Muennighoff, what's inside `train.py`? The way you're launching it in many frameworks leads to DP and not DDP (but perhaps not in PTL). So you might not comparing apples to...

Thank you for sharing your insights, Konstantin In general the problem is that https://github.com/nviDIA/nemo, which is what we use, abstracts all PTL bits away, giving the user an API that...

That's an interesting discovery, @schlabrendorff - though I'm not sure this is always the case. At least it doesn't seem to impact my setup I checked I'm using SLURM 22.05.09...

Ah, that's a possibility since perhaps they have planned to switch in 22.05 but didn't do it until 23.x. When I get a chance I can try the reverse -...

OK, I tested that w/ or w/o `-exclusive` I get all the available cores in the setup [above](https://github.com/Lightning-AI/pytorch-lightning/issues/18650#issuecomment-1872637017): ``` $ scontrol show -d job 3182_1 | grep CPUs/Task NumNodes=1 NumCPUs=48...