FastChat
FastChat copied to clipboard
Error with nomic-ai/gpt4all-13b-snoozy model
Hello, I'm trying to deploy a server on an AWS machine and test the performances of the model mentioned in the title. I've launched the model worker with the following command:
python3 -m fastchat.serve.model_worker --model-name "text-embedding-ada-002" --model-path nomic-ai/gpt4all-13b-snoozy --cpu-offloading --load-8bit
as mentioned in the documentation, providing a faux model name for using the OpenAI sdk in the client. However, whenever I call the api I get an Internal server Error that doesn't provide much info on the reason behind the crash:
INFO: 93.43.223.59:55983 - "POST /v1/engines/text-embedding-ada-002/embeddings HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/applications.py", line 276, in __call__
await super().__call__(scope, receive, send)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
await self.app(scope, receive, send)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app
raw_response = await run_endpoint_function(
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastchat/serve/openai_api_server.py", line 658, in create_embeddings
for i, emb in enumerate(embedding["embedding"])
KeyError: 'embedding'
any idea of why this happens? I've tested the same identical client but with lmsys/fastchat-t5-3b-v1.0 (mappend on the same identical faux name) and that one works.
@andy-yang-1 Could you take a look?
It might be caused by CUDA OOM, try with:
export WORKER_API_EMBEDDING_BATCH_SIZE=1
and restart the server & API?
It might be caused by CUDA OOM, try with:
export WORKER_API_EMBEDDING_BATCH_SIZE=1
and restart the server & API?
It doesn't work for me.
I temporarily bypassed this error by modifying the CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
to chunk_size=400
.
It might be caused by CUDA OOM, try with:
export WORKER_API_EMBEDDING_BATCH_SIZE=1
and restart the server & API?
Didn't work for me either. The error stays the same, doesn't really tell much so I cannot give any more details at the moment
I've managed to add a print of the error, and I can confirm that my specific issue is that CUDA goes out of memory. I'll dig further and let you know how to fix this, but at the same time I'd suggest a little improvement on the logging to help investigate issues
I have found where the problem lies. The max_seq_length of the fake model we specified differs from the actual deployed model. Therefore, langchain did not call 'get safe len' when splitting the chunk size. I will figure out a solution to this problem. For now, you can set chunk_size=400
to avoid this problem @hyunkelw
I have found where the problem lies. The max_seq_length of the fake model we specified differs from the actual deployed model. Therefore, langchain did not call 'get safe len' when splitting the chunk size. I will figure out a solution to this problem. For now, you can set
chunk_size=400
to avoid this problem @hyunkelw
I suppose you're talking about the chunk_size variable in get_model_answer.py? If not, please provide further details on where to set this value.
Thanks!
@hyunkelw You are right. You can change the CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
to CharacterTextSplitter(chunk_size=400, chunk_overlap=0)
@hyunkelw You are right. You can change the
CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
toCharacterTextSplitter(chunk_size=400, chunk_overlap=0)
I appreciate your very fast resposnse, but as I can see the only place in which that Class is mentioned is within the jupyter notebook "twitter_algo_analysis.ipynb". What I'm trying to achieve is calling the openai_api_server.py on an AWS machine like mentioned here and then testing it out from my local machine, by simulating the code mentioned in the aforementioned documentation and changing the base api with the IP of the AWS machine. So, you probably want me to submit a custom TextSplitter to the VectorStoreIndexCreator constructor mentioned there. In other words, instead of
...
loader = TextLoader('state_of_the_union.txt')
index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader])
I should change it to
myTextSplitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=0)
loader = TextLoader('Path/to/my/document.txt')
index = VectorstoreIndexCreator(embedding=embedding, text_splitter=myTextSplitter).from_loaders([loader])
I'll let you know if this works
Nope, didn't work.
`INFO: 93.49.152.59:62379 - "POST /v1/engines/text-embedding-ada-002/embeddings HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/applications.py", line 276, in call await super().call(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call await self.app(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call raise e File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app raw_response = await run_endpoint_function( File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function return await dependant.call(**values) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastchat/serve/openai_api_server.py", line 659, in create_embeddings for i, emb in enumerate(embedding["embedding"]) KeyError: 'embedding' NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.
(expected scalar type Half but found Float)`
the last three lines are a print of the error message that I've added to openai_api_server.py, method create_embeddings()
@hyunkelw Can you deploy the latest version? The new error message can help me to debug it.
Hi, I've deployed the latest version and rerun the code. This is the error:
openai.error.APIError: Invalid response object from API: '{"object":"error","message":"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS
PAGE.**\\n\\n(CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 22.04 GiB total capacity; 20.97 GiB already allocated; 19.12 MiB free; 21.03 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF)","code":50002}' (HTTP response code was 400)
(As per your previous suggestions, I've set the chunk_size to 400)
IC, it is caused by cuda oom, your gpu memory is limited 😿
Try
export WORKER_API_EMBEDDING_BATCH_SIZE=1
and restart API & controller & model worker. If it still doesn't work, I have to figure out a better solution
IC, it is caused by cuda oom, your gpu memory is limited 😿
Try
export WORKER_API_EMBEDDING_BATCH_SIZE=1
and restart API & controller & model worker. If it still doesn't work, I have to figure out a better solution
Unfortunately, That parameter was already present at the time of the latest test