Denis Mazur comments

Results 30 comments of


                                            Denis Mazur

CLI interface added

Hey, @NJannasch! Thanks for the PR and sorry for the long reply. We would like to make sure the script works before merging it. Could you maybe make a Colab...

Can this be used for Jambo inference

Hey, @freQuensy23-coder! The code in this repo is quite transformer-moe specific. I'm not too familiar with mamba-like architectures, but afaik @lavawolfiee has plans for adapting Jamba to work with our...

Run without quantization

What hardware do you plan running the model on? It would require quite the amount of combined RAM + VRAM to run the model without quantization.

Run without quantization

Yeah, sound like it'll fit :D The current codebase doesn't support running the model without quantization, but you could try rewriting the [expert wrapper class](https://github.com/dvmazur/mixtral-offloading/blob/ce545188b804238f0b23a59fc45e6a8f8b390c40/src/expert_wrapper.py#L9). This class moves the expert's...

Run without quantization

@freQuensy23-coder, yes, you are right - @lavawolfiee must have misunderstood you.

Run without quantization

> I've tried to rewrite your code to add a fp16 support using your tips, but i faced some difficulties: i don't understand where exactly in replace_layer_storage we use quantization?...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

Hey, everyone! Thanks for your interest and comments! 1. I'd like to discuss if we actually need LoRa adapters in the possible implementation. As I see it, they are not...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

I like your suggestion with the policy map. I think I'll wait for the other maintainer's opinions before opening the PR. Thanks!

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

Hi, everyone! Thank you for your suggestions. I'm currently busy with my uni exams, but I'll be back with a PR in a couple of weeks.

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

Hey, everyone! I've [implemented](http://github.com/deniskamazur/transformers/tree/gpt-j-8bit) the «hardcoded» version of this issue. You can verify it's functional over [here](https://colab.research.google.com/drive/1m3KQYva980cQnRoycCMAMEEcAyeallZJ?usp=sharing). Should I add any tests before opening a PR? I'd also be glad...