Denis Mazur

Results 30 comments of Denis Mazur

Hey, @NJannasch! Thanks for the PR and sorry for the long reply. We would like to make sure the script works before merging it. Could you maybe make a Colab...

Hey, @freQuensy23-coder! The code in this repo is quite transformer-moe specific. I'm not too familiar with mamba-like architectures, but afaik @lavawolfiee has plans for adapting Jamba to work with our...

What hardware do you plan running the model on? It would require quite the amount of combined RAM + VRAM to run the model without quantization.

Yeah, sound like it'll fit :D The current codebase doesn't support running the model without quantization, but you could try rewriting the [expert wrapper class](https://github.com/dvmazur/mixtral-offloading/blob/ce545188b804238f0b23a59fc45e6a8f8b390c40/src/expert_wrapper.py#L9). This class moves the expert's...

@freQuensy23-coder, yes, you are right - @lavawolfiee must have misunderstood you.

> I've tried to rewrite your code to add a fp16 support using your tips, but i faced some difficulties: i don't understand where exactly in replace_layer_storage we use quantization?...

Hey, everyone! Thanks for your interest and comments! 1. I'd like to discuss if we actually need LoRa adapters in the possible implementation. As I see it, they are not...

I like your suggestion with the policy map. I think I'll wait for the other maintainer's opinions before opening the PR. Thanks!

Hi, everyone! Thank you for your suggestions. I'm currently busy with my uni exams, but I'll be back with a PR in a couple of weeks.

Hey, everyone! I've [implemented](http://github.com/deniskamazur/transformers/tree/gpt-j-8bit) the «hardcoded» version of this issue. You can verify it's functional over [here](https://colab.research.google.com/drive/1m3KQYva980cQnRoycCMAMEEcAyeallZJ?usp=sharing). Should I add any tests before opening a PR? I'd also be glad...