Casper

Results 295 comments of Casper

The same reason that you want GPL, i.e. so that it stays open, is the same reason that many people cannot or will not contribute to the project. This is...

Any modification must be made public. So assuming that you just want to modify the work to adapt it for internal use, it must be made public. This is incompatible...

> Only if you need to distribute it to others you need to release the code under GPL. This is the issue that I am referring to. Any distribution means...

> @maximegmd any chance you could provide an example config file on how to use this? Set the `optimizer` argument in the axolotl config to one of [galore_adamw, galore_adamw8bit, galore_ada_factor]....

This is interesting work! I was going to implement int8 in AutoAWQ with time as the authors of SmoothQuant (this PR) and AWQ are the same. My best guestimate is...

@AniZpZ @zhyncs This is great work! My understanding is that this PR converts FP16 -> INT8 dynamically without computing a loss function to optimize perplexity. Have you evaluated perplexity on...

> > @AniZpZ @zhyncs This is great work! My understanding is that this PR converts FP16 -> INT8 dynamically without computing a loss function to optimize perplexity. Have you evaluated...

Hi @AniZpZ @zhyncs, thank you for your great work with this PR. I have now had more time to explore your fast implementation and found that Nvidia only has support...

I agree that the RMSNorm and Linear layers run in INT8, making it W8. Running in A8 means running your activation functions with INT8, and this is not implemented (see...

@viktor-ferenczi I am much in favor of W8A16 which is currently implemented. Quantized models will be easier to create in the W8A16 format without accuracy degradation. --- @AniZpZ @zhyncs I...