vllm
vllm copied to clipboard
GPT-J model
reference to issue https://github.com/vllm-project/vllm/issues/198
@AndreSlavescu Awesome! Thanks for your contribution. Is this PR ready for review? Otherwise, please ping me when you are ready. Thanks again!
can you merge this change? so we can test it out with our fine tuned gpt-j model?
8-)
@AndreSlavescu What's going on with the PR? If you are not able to continue it, no worries, I can take it. Please let us know if you have any question.
@WoosukKwon Hi sorry for the delayed reply, had a busy schedule this past week. I won't have much time to continue this coming week, so please continue on it if you'd like. Thanks!
Is it just waiting for review or requires additional work? Is it expected to be working (if so I can use it now).
@ri938 This PR is not ready yet. I'll take this over and finish the PR soon.
~~The PR is currently blocked because GPT-J's rotary embedding requires a new kernel (IIUC, it's different from GPT-NeoX's rotary embedding). I will address it this weekend.~~ Turns out that this is not a problem.
@zhuohan123 This PR is ready for review. Please take a look at it.
@silvacarl2 @ri938 We'v just merged this PR. Please install vLLM from source and try it out!
Cool will do!!
got this error:
python offline_inference.py
INFO 07-09 10:47:10 llm_engine.py:59] Initializing an LLM engine with config: model='EleutherAI/gpt-j-6b', dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
Traceback (most recent call last):
File "offline_inference.py", line 14, in
same with gpt-neo:
python offline_inference.py
Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.46k/1.46k [00:00<00:00, 1.09MB/s]
INFO 07-09 10:48:12 llm_engine.py:59] Initializing an LLM engine with config: model='EleutherAI/gpt-neo-2.7B', dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 173kB/s]
Downloading (…)olve/main/vocab.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 798k/798k [00:00<00:00, 5.96MB/s]
Downloading (…)olve/main/merges.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 38.1MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 90.0/90.0 [00:00<00:00, 168kB/s]
Traceback (most recent call last):
File "offline_inference.py", line 14, in
@silvacarl2 Could you check again if you installed the latest vLLM from source?
BTW, GPTNeo is not supported yet.
NP, trying out others
install vllm from source. i encountered this problem: (generator38) fsuser@recau5mvammeirzd3:~/chat_generator$ python -m vllm.entrypoints.openai.api_server \
--model PygmalionAI/pygmalion-6b
--host 0.0.0.0 INFO 07-20 03:51:12 llm_engine.py:60] Initializing an LLM engine with config: model='PygmalionAI/pygmalion-6b', tokenizer='PygmalionAI/pygmalion-6b', tokenizer_mode=auto, trust_remote_code=False, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0) Traceback (most recent call last): File "/home/fsuser/anaconda3/envs/generator38/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/fsuser/anaconda3/envs/generator38/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/fsuser/vllm/vllm/entrypoints/openai/api_server.py", line 583, inengine = AsyncLLMEngine.from_engine_args(engine_args) File "/home/fsuser/vllm/vllm/engine/async_llm_engine.py", line 232, in from_engine_args engine = cls(engine_args.worker_use_ray, File "/home/fsuser/vllm/vllm/engine/async_llm_engine.py", line 55, in init self.engine = engine_class(*args, **kwargs) File "/home/fsuser/vllm/vllm/engine/llm_engine.py", line 99, in init worker = worker_cls( File "/home/fsuser/vllm/vllm/worker/worker.py", line 45, in init self.model = get_model(model_config) File "/home/fsuser/vllm/vllm/model_executor/model_loader.py", line 43, in get_model model = model_class(model_config.hf_config) File "/home/fsuser/vllm/vllm/model_executor/models/gpt_j.py", line 192, in init self.transformer = GPTJModel(config) File "/home/fsuser/vllm/vllm/model_executor/models/gpt_j.py", line 157, in init [GPTJBlock(config) for _ in range(config.n_layer)]) File "/home/fsuser/vllm/vllm/model_executor/models/gpt_j.py", line 157, in [GPTJBlock(config) for _ in range(config.n_layer)]) File "/home/fsuser/vllm/vllm/model_executor/models/gpt_j.py", line 122, in init self.attn = GPTJAttention(config) File "/home/fsuser/vllm/vllm/model_executor/models/gpt_j.py", line 68, in init assert config.rotary File "/home/fsuser/anaconda3/envs/generator38/lib/python3.8/site-packages/transformers/configuration_utils.py", line 260, in getattribute return super().getattribute(key) AttributeError: 'GPTJConfig' object has no attribute 'rotary'