Anima
Anima copied to clipboard
33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU
### main.py ``` from airllm import AirLLMLlamaMlx import mlx.core as mx MAX_LENGTH = 128 # could use hugging face model repo id: model = AirLLMLlamaMlx("Qwen/Qwen-7B-Chat",layer_shards_saving_path='.cache') input_text = [ 'I like',...
![image](https://github.com/lyogavin/Anima/assets/8924566/2a1013d1-291c-4066-8ba5-3d73fddedc85)
我用以下的代码来加载microsoft-phi2, `from airllm import AutoModel` 报错: ``` 270, in split_and_save_layers if max(shards) > shard: ^^^^^^^^^^^ ValueError: max() arg is an empty sequence ```
Hello, I can't help to ask if you have ever tried to implement any parallelism strategies to this program to help the inference in general as far as being able...
Is there a way to quantize on macos ? bitsandbytes is not supported on Apple sillicon. Can we you GGUF Models ?
``` from sys import platform from airllm import AutoModel import mlx.core as mx assert platform == "darwin", "this example is supposed to be run on mac os" # model =...
Mac M1 Max 32GB user here without ability to bitsandbites quantize Is there a way configure the chunk size for the inference to be quicker ? I think the 32GB...
I am attempting to run Llama13b using an NVIDIA GeForce RTX 3090, but the model never completes loading. ![image](https://github.com/lyogavin/Anima/assets/79256834/c1294013-83ef-42e9-a97f-38407dcd542c)
Kunrt