Sebastian Raschka
Sebastian Raschka
Referencing #501 because there seems to be a similar issue when using larger microbatch sizes.
Hi there and sorry for the very late follow-up. Thanks for generously offering to contribute this tutorial, but I would say it's a bit out of scope for now. If...
Hi there. Based on the message you are getting, it looks like all the memory is used. Based on your nvidia-smi output, you have 97871 MiB, which is 97871 /...
Oh I see now, I didn't realize it was a unified memory between CPU and GPU (I thought it was somehow between multiple GPUs). In that case, I am actually...
Looks like unified memory support is still work in progress on the PyTorch side: https://github.com/vllm-project/vllm/issues/10267 & https://github.com/pytorch/pytorch/issues/124807
Thanks for the note, I appreciate it. I am currently out with an injury but will bookmark this and revisit it in the upcoming weeks.
Thanks for reporting! @Andrei-Aksionov and I will take a look
@Andrei-Aksionov The mask still needs to be diagonal, otherwise a given token will attend a future token. But that part is already handled via triu / tril in the code....
For the context tokens it would though. I.e., in a generation step, the first generated output token would depend on itself and all future tokens. However, since we chop those...
That kv cache ... I still need to wrap my head around it. I probably should code it from scratch for myself some time to getter grasp on how to...