Marc Sun

Results 59 comments of Marc Sun

Hi @amrothemich, you need to pass your custom `device_map` when you load your model: ```python # Load 4-bit quantized model model = AutoModelForCausalLM.from_pretrained( modelpath, device_map=custom_device_map, quantization_config=bnb_config, torch_dtype=torch.bfloat16, ) ``` LMK...

Hi @hackyon, thanks for still working on this PR ! > Yea, I won't mind helping to add more no_split_modules if that is something you think is relatively high priority....

Hi @hackyon, I can indeed merge but the code needs to be validated from a core maintainer. Could you have a look @ArthurZucker ? Thanks again @hackyon for your patience.

Check this [doc](https://huggingface.co/docs/accelerate/concept_guides/big_model_inference) from accelerate library. You can use big model inference directly by passing `device_map` in `from_pretained` if you are using transformers library !

Hi @MrRobot2211, could you try to run it without setting `os.environ["CUDA_VISIBLE_DEVICES"] = "1,2"`. I want to check if this is the line that causes the issue. Thanks !

Thanks ! I will investigate why it is happening. If you could share a minimal reproducer, that would help me a lot to fix this issue.

Hi @kevinknights29 , I see that in your script, you are trying to load in 8-bit and in 4-bit at the same time. Please select only one option. ```py return...