Anima
Anima copied to clipboard
33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU
### Discussed in https://github.com/lyogavin/Anima/discussions/113 Originally posted by **janmartin** February 11, 2024 AirLLM is great. And it desperately needs a simple installer and UI like AUTOMATIC1111 (for stable diffusion) for Windows....
as title
运行代码 airllm到 model = AirLLMLlama2("/home/user/models/Anima-7B-100K")这一句的时候,出现下面错误: ``` model = AirLLMLlama2("/home/user/models/Anima-7B-100K") found index file... found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True,...
I'm not sure it makes sense to load more than one layer from performance standpoint, but using 1.6GB out of 11GB/16GB of typical consumer GPU is not optimal (and super...
Does AirLLM support AMD gpu?
Using `is_flash_attn_available` is deprecated and will be removed in v4.38. Please use `is_flash_attn_2_available` instead. Traceback (most recent call last): File "/opt/ai/test/inference_example_test.py", line 8, in model = AirLLMLlama2("/root/autodl-tmp/ai/Yi-34B-Chat",layer_shards_saving_path="/root/autodl-tmp/ai/layerSave") File "/root/miniconda3/lib/python3.10/site-packages/airllm/airllm.py", line...
Will the airllm framework be adapted for the streaming output functionality of different models in the future?
The sample code (taken from AirLLM examples): ```python from airllm import AirLLMLlamaMlx import mlx.core as mx MAX_LENGTH = 128 model = AirLLMLlamaMlx("garage-bAInd/Platypus2-7B") input_text = [ 'I like', ] input_tokens =...
If we can run 70B models with just a 4GB vram graphic card, does it mean it is also possible to finetune a 70B model with a single 4090 with...