Jianwei Li

Results 2 issues of Jianwei Li

To save GPU memory, I want to load the multilingual model in 4bit mode, the code is as follows. ```python import torch from transformers import AutoTokenizer from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration...

I have trained several LoRA models using the Flux model, and I want to switch between these LoRA models dynamically without reloading the base Flux model. I saw in #1185...