Sebastian Raschka comments

Results 818 comments of


                                            Sebastian Raschka

OOM with bf16-true, Quantization, for long context length.

Referencing #501 because there seems to be a similar issue when using larger microbatch sizes.

[Chapter 06] Add “Deploy on Streamlit Community Cloud” section

Hi there and sorry for the very late follow-up. Thanks for generously offering to contribute this tutorial, but I would say it's a bit out of scope for now. If...

How to make use of NVIDIA GH200 Grace Hopper Superchip

Hi there. Based on the message you are getting, it looks like all the memory is used. Based on your nvidia-smi output, you have 97871 MiB, which is 97871 /...

How to make use of NVIDIA GH200 Grace Hopper Superchip

Oh I see now, I didn't realize it was a unified memory between CPU and GPU (I thought it was somehow between multiple GPUs). In that case, I am actually...

How to make use of NVIDIA GH200 Grace Hopper Superchip

Looks like unified memory support is still work in progress on the PyTorch side: https://github.com/vllm-project/vllm/issues/10267 & https://github.com/pytorch/pytorch/issues/124807

Lables of checkerboard plot in example of mcnemar_table do not match the documentation

Thanks for the note, I appreciate it. I am currently out with an injury but will bookmark this and revisit it in the upcoming weeks.

attention mask is incorrect when generate with softcapping

Thanks for reporting! @Andrei-Aksionov and I will take a look

attention mask is incorrect when generate with softcapping

@Andrei-Aksionov The mask still needs to be diagonal, otherwise a given token will attend a future token. But that part is already handled via triu / tril in the code....

attention mask is incorrect when generate with softcapping

For the context tokens it would though. I.e., in a generation step, the first generated output token would depend on itself and all future tokens. However, since we chop those...

attention mask is incorrect when generate with softcapping

That kv cache ... I still need to wrap my head around it. I probably should code it from scratch for myself some time to getter grasp on how to...