Stas Bekman

Results 664 comments of Stas Bekman

Yeah, I get that too when I try to load too much of a batch size. But if you're running my script its default is bs=1 so shouldn't really be...

@asaparov, please run the following 2 experiments 1. same set up as your but add: `CUDA_LAUNCH_BLOCKING=1` as in: ``` CUDA_LAUNCH_BLOCKING=1 deepspeed --hostfile=$hostfile Megatron-DeepSpeed/scripts/inference/bloom-ds-inference.py --name bigscience/bloom ``` and let's see if...

Thank you for reporting back, @asaparov! You may use this way for now, it will be just a tad slower, until the underlying issue is resolved. The difficulty is in...

I suspect that the bug is intermittent as it pops up in various situations and inconsistent. But if it works at the moment for you that's great! Yes, the `save_mp_checkpoint_path`...

Very neat, @jaketae! I think I/we didn't think it through - these are 2 totally different ranges. One is samples and the other is iterations. And they have no linear...

Thinking more about it - given the transient nature of this project (I don't see Megatron-LM integrating our improvements), and that our attempts to study data at the points of...

Oh, there is no need to close it. We can just keep it if we want to resume it later. Thank you for taking this outcome in a kind manner,...

If all source files could be easily identified this perhaps the cloning could be done in a few perl one liners. Here is a very rough outline: 1. find the...

We already have a PR https://github.com/huggingface/transformers/pull/14084 - nothing is holding us back from merging it, other than making sure it does the right thing.

ok, so the 2 LayerNorm implementations diverge a lot under fp16 (while very little under fp32) ![snapshot_9](https://user-images.githubusercontent.com/10676103/136615452-d3069555-5765-4456-90b2-8962f529f7f2.png) I have some outstanding changes to the test where I switched to using...