Stas Bekman

Results 664 comments of Stas Bekman

> Also worth mentioning that whether a BOS token is added automatically depends on the particular model (tokenizer). But it's the case for e.g. llama and mistral tokenizers. Once the...

Ah, thank you so much Nick, for some reason I was drawing blank on finding how to use it. Now I was able to tell vllm not to add BOS...

should the [`LLM`](https://docs.vllm.ai/en/stable/dev/offline_inference/llm.html) API be consistent with the client API and offer `add_special_tokens=False`? I guess it'd belong to `SamplingParams` and not `LLM` `SamplingParams` already has `skip_special_tokens` - but refers to...

I have a question about `lower_precision_safe_modules` https://pytorch.org/docs/stable/amp.html#torch.autocast doesn't have an option to specify `lower_precision_safe_modules` - why do we then put the onus on the deepspeed user? (and I'm aware that...

> One confusing aspect is that it's valid to use PyTorch’s torch.autocast while disabling "torch_autocast" in the DeepSpeed config. In this case, DeepSpeed treats it as a pure FP32 setting....

I'm talking about peak memory usage. if you run torch memory profiler around the reduction call you will see that it'll use `reduce_bucket_size * 4 bytes * 2 copies` -...

Then assert is the way to go, Masahiro

Hmm, I can't just hit approve, that would be defeat the purpose of doing the review. We have only discussed one small aspect of this PR, which has been resolved,...

BTW, an alternative solution has been just posted here: https://github.com/huggingface/transformers/pull/32305 so perhaps wait before the maintainers decided which version is the most fitting before you invest more time into improving...

That's a wonderfully written followup, Elsa @samadejacobs, could you please have a look at Elsa's comments above and see if anything still needs to be added to your PR -...