Anima issues

ValueError: max() arg is an empty sequence(Apple M2 Max, macOS 14.2.1)

6

### main.py ``` from airllm import AirLLMLlamaMlx import mlx.core as mx MAX_LENGTH = 128 # could use hugging face model repo id: model = AirLLMLlamaMlx("Qwen/Qwen-7B-Chat",layer_shards_saving_path='.cache') input_text = [ 'I like',...

tvsj

future work

safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

1

![image](https://github.com/lyogavin/Anima/assets/8924566/2a1013d1-291c-4066-8ba5-3d73fddedc85)

fudp

bug

microsoft-phi2:max() arg is an empty sequence

1

我用以下的代码来加载microsoft-phi2， `from airllm import AutoModel` 报错： ``` 270, in split_and_save_layers if max(shards) > shard: ^^^^^^^^^^^ ValueError: max() arg is an empty sequence ```

zazaji

future work

Support for T5 based models

1

Does this support Flan-T5 model? Thanks

balachandarsv

enhancement

Would adding Parallelism speed up AirLLM?

Hello, I can't help to ask if you have ever tried to implement any parallelism strategies to this program to help the inference in general as far as being able...

birdup000

question

Mac quantization

Is there a way to quantize on macos ? bitsandbytes is not supported on Apple sillicon. Can we you GGUF Models ?

ageorgios

question

Mac Airllm Inference tigerbot-70b-chat-v2

``` from sys import platform from airllm import AutoModel import mlx.core as mx assert platform == "darwin", "this example is supposed to be run on mac os" # model =...

ageorgios

bug

configure the chunk split size

Mac M1 Max 32GB user here without ability to bitsandbites quantize Is there a way configure the chunk size for the inference to be quicker ? I think the 32GB...

ageorgios

question

So much time loading

2

I am attempting to run Llama13b using an NVIDIA GeForce RTX 3090, but the model never completes loading. ![image](https://github.com/lyogavin/Anima/assets/79256834/c1294013-83ef-42e9-a97f-38407dcd542c)

Alvaro8gb

bug

help wanted

Create kunrt

Kunrt

liqikun0000

Anima
Anima copied to clipboard

Metadata

ValueError: max() arg is an empty sequence(Apple M2 Max, macOS 14.2.1)

safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

microsoft-phi2:max() arg is an empty sequence

Support for T5 based models

Would adding Parallelism speed up AirLLM?

Mac quantization

Mac Airllm Inference tigerbot-70b-chat-v2

configure the chunk split size

So much time loading

Create kunrt

← Metadata

Owner

Metadata

Anima Anima copied to clipboard

Metadata

← Metadata

Owner

Metadata

Anima
Anima copied to clipboard