Casper comments

Results 295 comments of


                                            Casper

License issue

The same reason that you want GPL, i.e. so that it stays open, is the same reason that many people cannot or will not contribute to the project. This is...

License issue

Any modification must be made public. So assuming that you just want to modify the work to adapt it for internal use, it must be made public. This is incompatible...

License issue

> Only if you need to distribute it to others you need to release the code under GPL. This is the issue that I am referring to. Any distribution means...

> @maximegmd any chance you could provide an example config file on how to use this? Set the `optimizer` argument in the axolotl config to one of [galore_adamw, galore_adamw8bit, galore_ada_factor]....

Support int8 KVCacheQuant and W8A8 inference in vllm

This is interesting work! I was going to implement int8 in AutoAWQ with time as the authors of SmoothQuant (this PR) and AWQ are the same. My best guestimate is...

Support int8 KVCacheQuant and W8A8 inference in vllm

@AniZpZ @zhyncs This is great work! My understanding is that this PR converts FP16 -> INT8 dynamically without computing a loss function to optimize perplexity. Have you evaluated perplexity on...

Support int8 KVCacheQuant and W8A8 inference in vllm

> > @AniZpZ @zhyncs This is great work! My understanding is that this PR converts FP16 -> INT8 dynamically without computing a loss function to optimize perplexity. Have you evaluated...

Support int8 KVCacheQuant and W8A8 inference in vllm

Hi @AniZpZ @zhyncs, thank you for your great work with this PR. I have now had more time to explore your fast implementation and found that Nvidia only has support...

Support int8 KVCacheQuant and W8A8 inference in vllm

I agree that the RMSNorm and Linear layers run in INT8, making it W8. Running in A8 means running your activation functions with INT8, and this is not implemented (see...

Support int8 KVCacheQuant and W8A8 inference in vllm

@viktor-ferenczi I am much in favor of W8A16 which is currently implemented. Quantized models will be easier to create in the W8A16 format without accuracy degradation. --- @AniZpZ @zhyncs I...