Srinivas Billa comments

Results 49 comments of


Srinivas Billa

Investigate PagedAttention KV-cache memory management for faster inference

Yeah, I think first we need to solve batch inference. It's implemented in babyllama but I'm haven't tried to port it over to the main llama yet

Investigate PagedAttention KV-cache memory management for faster inference

That's fair, batch inference would be useful for me use this at scale. For example if I want to do sentiment analysis for a large dataset or summarisation at scale.

Investigate PagedAttention KV-cache memory management for faster inference

And in this case having a server to handle multiple users at the same time

[Feature Request] 10x faster training: Decontaminated Sample Packing

@danielhanchen I'm happy to port the implementation over if you want to include it in unsloth. It would look like a separate training script with the necessary files being included...

[Feature Request] 10x faster training: Decontaminated Sample Packing

Actually on second thought I'll work on it anyway since I also need this pretty bad lol. Since I do a lot of training runs every day it would save...

[Feature Request] 10x faster training: Decontaminated Sample Packing

MeetKai said in the PR that it's okay with positional encoding? https://github.com/huggingface/trl/pull/1235#issuecomment-1900632280 He also said it could be implemented without FA but not sure how to do that. And yeah...

[Feature Request] 10x faster training: Decontaminated Sample Packing

Contamination is definitely an issue. I've tested it on the same dataset that is heavily correlated (aspect based sentiment) and the difference between the packed and non packed is big.

v0.4.2 Release Tracker

Following on from @vrdn-23 , #3466 would be great too. I already use ray for scaling across multiple nodes. And this is the only solution that works when using models...

[Feature]: FP6

Thanks @mgoin , yes the performance isn't as good as INT4. However the model performance is nearly indistinguishable from fp16 which is really nice. I hope that fp6 becomes the...

[Feature]: FP6

@mgoin I'm a bit confused, why does fp6 not save vram? Even if the activations are in fp16. Surely the weights being in fp6 save memory right?