FastChat Error with nomic-ai/gpt4all-13b-snoozy model

Hello, I'm trying to deploy a server on an AWS machine and test the performances of the model mentioned in the title. I've launched the model worker with the following command:

python3 -m fastchat.serve.model_worker --model-name "text-embedding-ada-002" --model-path nomic-ai/gpt4all-13b-snoozy --cpu-offloading --load-8bit

as mentioned in the documentation, providing a faux model name for using the OpenAI sdk in the client. However, whenever I call the api I get an Internal server Error that doesn't provide much info on the reason behind the crash:

INFO:     93.43.223.59:55983 - "POST /v1/engines/text-embedding-ada-002/embeddings HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastchat/serve/openai_api_server.py", line 658, in create_embeddings
    for i, emb in enumerate(embedding["embedding"])
KeyError: 'embedding'

any idea of why this happens? I've tested the same identical client but with lmsys/fastchat-t5-3b-v1.0 (mappend on the same identical faux name) and that one works.

Jun 12 '23 15:06 hyunkelw

@andy-yang-1 Could you take a look?

Jun 12 '23 16:06 merrymercy

It might be caused by CUDA OOM, try with:

export WORKER_API_EMBEDDING_BATCH_SIZE=1

and restart the server & API?

Jun 12 '23 16:06 andy-yang-1

It might be caused by CUDA OOM, try with:
export WORKER_API_EMBEDDING_BATCH_SIZE=1
and restart the server & API?

It doesn't work for me.

I temporarily bypassed this error by modifying the CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) to chunk_size=400.

Jun 13 '23 07:06 daxiaraoming

It might be caused by CUDA OOM, try with:
export WORKER_API_EMBEDDING_BATCH_SIZE=1
and restart the server & API?

Didn't work for me either. The error stays the same, doesn't really tell much so I cannot give any more details at the moment

Jun 13 '23 08:06 hyunkelw

I've managed to add a print of the error, and I can confirm that my specific issue is that CUDA goes out of memory. I'll dig further and let you know how to fix this, but at the same time I'd suggest a little improvement on the logging to help investigate issues

Jun 13 '23 08:06 hyunkelw

I have found where the problem lies. The max_seq_length of the fake model we specified differs from the actual deployed model. Therefore, langchain did not call 'get safe len' when splitting the chunk size. I will figure out a solution to this problem. For now, you can set chunk_size=400 to avoid this problem @hyunkelw

Jun 14 '23 12:06 andy-yang-1

I have found where the problem lies. The max_seq_length of the fake model we specified differs from the actual deployed model. Therefore, langchain did not call 'get safe len' when splitting the chunk size. I will figure out a solution to this problem. For now, you can set chunk_size=400 to avoid this problem @hyunkelw

I suppose you're talking about the chunk_size variable in get_model_answer.py? If not, please provide further details on where to set this value.

Thanks!

Jun 15 '23 09:06 hyunkelw

@hyunkelw You are right. You can change the CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) to CharacterTextSplitter(chunk_size=400, chunk_overlap=0)

Jun 15 '23 09:06 andy-yang-1

@hyunkelw You are right. You can change the CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) to CharacterTextSplitter(chunk_size=400, chunk_overlap=0)

I appreciate your very fast resposnse, but as I can see the only place in which that Class is mentioned is within the jupyter notebook "twitter_algo_analysis.ipynb". What I'm trying to achieve is calling the openai_api_server.py on an AWS machine like mentioned here and then testing it out from my local machine, by simulating the code mentioned in the aforementioned documentation and changing the base api with the IP of the AWS machine. So, you probably want me to submit a custom TextSplitter to the VectorStoreIndexCreator constructor mentioned there. In other words, instead of

...
loader = TextLoader('state_of_the_union.txt')
index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader])

I should change it to

    myTextSplitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=0)
    loader = TextLoader('Path/to/my/document.txt')
    index = VectorstoreIndexCreator(embedding=embedding, text_splitter=myTextSplitter).from_loaders([loader])

I'll let you know if this works

Jun 15 '23 10:06 hyunkelw

Nope, didn't work.

`INFO: 93.49.152.59:62379 - "POST /v1/engines/text-embedding-ada-002/embeddings HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/applications.py", line 276, in call await super().call(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call await self.app(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call raise e File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app raw_response = await run_endpoint_function( File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function return await dependant.call(**values) File "/home/ubuntu/fast_chat_env/lib/python3.10/site-packages/fastchat/serve/openai_api_server.py", line 659, in create_embeddings for i, emb in enumerate(embedding["embedding"]) KeyError: 'embedding' NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.

(expected scalar type Half but found Float)`

the last three lines are a print of the error message that I've added to openai_api_server.py, method create_embeddings()

Jun 15 '23 10:06 hyunkelw

@hyunkelw Can you deploy the latest version? The new error message can help me to debug it.

Jun 15 '23 11:06 andy-yang-1

Hi, I've deployed the latest version and rerun the code. This is the error:

openai.error.APIError: Invalid response object from API: '{"object":"error","message":"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS 
PAGE.**\\n\\n(CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 22.04 GiB total capacity; 20.97 GiB already allocated; 19.12 MiB free; 21.03 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF)","code":50002}' (HTTP response code was 400)

(As per your previous suggestions, I've set the chunk_size to 400)

Jun 15 '23 12:06 hyunkelw

IC, it is caused by cuda oom, your gpu memory is limited 😿

Try

export WORKER_API_EMBEDDING_BATCH_SIZE=1

and restart API & controller & model worker. If it still doesn't work, I have to figure out a better solution

Jun 15 '23 12:06 andy-yang-1

IC, it is caused by cuda oom, your gpu memory is limited 😿

Try
export WORKER_API_EMBEDDING_BATCH_SIZE=1
and restart API & controller & model worker. If it still doesn't work, I have to figure out a better solution

Unfortunately, That parameter was already present at the time of the latest test

Jun 15 '23 13:06 hyunkelw

FastChat FastChat copied to clipboard

Error with nomic-ai/gpt4all-13b-snoozy model

FastChat
FastChat copied to clipboard