Younes B

Results 518 comments of Younes B

cc-ing also @michaelbenayoun in case you want to have a look as well ;)

Hi all! Just to summarise a bit about what is happening and the solution we came up to implement this! In the previous version, we found out 2 major bugs:...

Thank you very much for your comments! `has_fp16_weights` comes from the class `bnb.Int8Params` that is currently being developed in a WIP branch that should be merged soon on the main...

I think before merging we need: - [x] Memory footprint benchmarking - [x] Infrence speed benchmarking - [x] `lm-eval` benchmarking for large models (it has been done for small models)...

Added another PR to support int8 quantization + `accelerate` on multi-GPU setup here: https://github.com/huggingface/accelerate/pull/539 !

Thanks @sgugger for your review ! Fixed the suggestions ;) I think that we are good to go to merge https://github.com/huggingface/accelerate/pull/539 if you don't mind 🙏 I just need to...

TODOs: - [x] Have a working colab demo for inference - [x] Add more documentation - [x] Implement tests

Before moving forward, I would like to have a comment from @michaelbenayoun @mfuntowicz and @echarlaix ## About this PR We replace all the `nn.Linear` modules by the `bnb.Linear8bitLt` modules from...

Can confirm the slow tests that I have designed are passing on my testing machine (2x Tesla T4 15GB). But for now it is not possible to load saved int8...

Hi @cnbeining ! Thanks for your interest in this feature and happy to see that you are already excited to run it on Codegen! 🚀 Initially your problem is related...