FastChat Cannot succesfully pull VLMs to fastchat

trafficstars

For anyone that has gotten VLMs to work in fastchat. How did you do so? I cannot even pull any llava model from hugging face successfully. These have been my results so far:

python -m fastchat.serve.cli --model-path llava-hf/llava-v1.6-vicuna-13b-hf
/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.18k/1.18k [00:00<00:00, 249kB/s]
tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 6.64MB/s]
added_tokens.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23.0/23.0 [00:00<00:00, 21.8kB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 552/552 [00:00<00:00, 366kB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 7.02MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.34k/1.34k [00:00<00:00, 284kB/s]
Traceback (most recent call last):
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/serve/cli.py", line 304, in <module>
    main(args)
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/serve/cli.py", line 227, in main
    chat_loop(
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/serve/inference.py", line 361, in chat_loop
    model, tokenizer = load_model(
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/model/model_adapter.py", line 353, in load_model
    model, tokenizer = adapter.load_model(model_path, kwargs)
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/model/model_adapter.py", line 689, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.llava_next.configuration_llava_next.LlavaNextConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, ElectraConfig, ErnieConfig, FalconConfig, FuyuConfig, GemmaConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, JambaConfig, LlamaConfig, MambaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, OlmoConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

python -m fastchat.serve.cli --model-path llava-hf/llava-1.5-13b-hf
Traceback (most recent call last):
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/serve/cli.py", line 304, in <module>
    main(args)
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/serve/cli.py", line 227, in main
    chat_loop(
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/serve/inference.py", line 361, in chat_loop
    model, tokenizer = load_model(
  File "/home/philip/miniconda3/envs/myfastchat/lib/python3.9/site-packages/fastchat/model/model_adapter.py", line 353, in load_model
    model, tokenizer = adapter.load_model(model_path, kwa

I am using a Conda Environment (Python 3.9.19) if that is relevant.

May 20 '24 04:05 PhilipAmadasun

There is one merge request going around which provides support for multi modal models. Are you sing the version from pip or a git clone of the ain repo? I recommend the latter and keep it up to date

May 20 '24 09:05 surak

@surak I just pip installed fastchat, didn't look into the what repo branch the pip install pulls from. What branch do I clone to get VLM support? And does said branch support all VLMs in huggingface? I would assume not.

May 20 '24 21:05 PhilipAmadasun

I would recommend you to remove the one installed with pip and use the one directly cloned from the repo, and keep it up to date with git pull. Things change too fast for versions :-)

May 21 '24 09:05 surak

@surak I did all that and still doesn't work. I realize I can actually only pull vicuna. Nothing else works. Could you please provide steps you did to get fast chat working and what models from huggingface you have successfully pulled?

Jun 01 '24 18:06 PhilipAmadasun

I have used dozens of models from huggingface.

Those are the pip requirements which I install in the venv:

pip
torch
sglang[srt]
accelerate
flash-attn
transformers @ git+https://github.com/huggingface/transformers.git
mamba-ssm
causal-conv1d>=1.2.0

Before them, I already have my own cuda, pytorch, python, numpy etc... as supercomputer modules.

That's it, really.

Then I activate the venv, do a git clone https://github.com/lm-sys/FastChat.git, enter the directory, and this is how I run it:

The variables only set the host which I run and port, which are pretty much the default

export BLABLADOR_DIR="/home/strube/FastChat"
source $BLABLADOR_DIR/config-blablador.sh

cd $BLABLADOR_DIR
source $BLABLADOR_DIR/sc_venv_sglang2/activate.sh

# Avoid the ModuleNotFoundError
export PATH=$BLABLADOR_DIR:$PATH
export PYTHONPATH=$BLABLADOR_DIR:$PYTHONPATH
export NUMEXPR_MAX_THREADS=8

srun python3 $BLABLADOR_DIR/fastchat/serve/sglang_worker.py \
     --controller $BLABLADOR_CONTROLLER:$BLABLADOR_CONTROLLER_PORT \
     --port 31029 --worker-address http://$(hostname).fz-juelich.de:31029 \
     --num-gpus 1 \
     --host 0.0.0.0 \
     --model-path models/Mistral-7B-Instruct-v0.2

Jun 02 '24 12:06 surak

FastChat FastChat copied to clipboard

Cannot succesfully pull VLMs to fastchat

FastChat
FastChat copied to clipboard