Anima icon indicating copy to clipboard operation
Anima copied to clipboard

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Results 64 Anima issues
Sort by recently updated
recently updated
newest added

### main.py ``` from airllm import AirLLMLlamaMlx import mlx.core as mx MAX_LENGTH = 128 # could use hugging face model repo id: model = AirLLMLlamaMlx("Qwen/Qwen-7B-Chat",layer_shards_saving_path='.cache') input_text = [ 'I like',...

future work

![image](https://github.com/lyogavin/Anima/assets/8924566/2a1013d1-291c-4066-8ba5-3d73fddedc85)

bug

我用以下的代码来加载microsoft-phi2, `from airllm import AutoModel` 报错: ``` 270, in split_and_save_layers if max(shards) > shard: ^^^^^^^^^^^ ValueError: max() arg is an empty sequence ```

future work

Does this support Flan-T5 model? Thanks

enhancement

Hello, I can't help to ask if you have ever tried to implement any parallelism strategies to this program to help the inference in general as far as being able...

question

Is there a way to quantize on macos ? bitsandbytes is not supported on Apple sillicon. Can we you GGUF Models ?

question

``` from sys import platform from airllm import AutoModel import mlx.core as mx assert platform == "darwin", "this example is supposed to be run on mac os" # model =...

bug

Mac M1 Max 32GB user here without ability to bitsandbites quantize Is there a way configure the chunk size for the inference to be quicker ? I think the 32GB...

question

I am attempting to run Llama13b using an NVIDIA GeForce RTX 3090, but the model never completes loading. ![image](https://github.com/lyogavin/Anima/assets/79256834/c1294013-83ef-42e9-a97f-38407dcd542c)

bug
help wanted