Open-Assistant
Open-Assistant copied to clipboard
inference-workers exits when trying other models than distilgpt2 (on a non-GPU system)
Is it possible to run the worker with other models than distilgpt2 on a non GPU-system?
After successfully launching the services (profiles ci + inference) with the distilgpt2 model, I tried to start it for other models (ex. OA_SFT_Pythia_12B_4), but the inference-workers container fails after waiting for the inference server to be ready.
The inference-server reports that it has started:
2023-04-30 15:19:04.225 | WARNING | oasst_inference_server.routes.workers:clear_worker_sessions:288 - Clearing worker sessions
2023-04-30 15:19:04.227 | WARNING | oasst_inference_server.routes.workers:clear_worker_sessions:291 - Successfully cleared worker sessions
2023-04-30 15:19:04.227 | WARNING | main:welcome_message:119 - Inference server started
2023-04-30 15:19:04.227 | WARNING | main:welcome_message:120 - To stop the server, press Ctrl+C
but the inference-worker stops after a minute of waiting:
2023-04-30T15:22:39.170299Z INFO text_generation_launcher: Starting shard 0
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2023-04-30 15:22:40.215 | INFO | __main__:main:25 - Inference protocol version: 1
2023-04-30 15:22:40.215 | WARNING | __main__:main:28 - Model config: model_id='OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5' max_input_length=1024 max_total_length=2048 quantized=False
2023-04-30 15:22:40.756 | WARNING | __main__:main:37 - Tokenizer OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 vocab size: 50254
2023-04-30 15:22:40.759 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 6.22 seconds
2023-04-30 15:22:46.991 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 1.95 seconds
2023-04-30 15:22:48.947 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 5.09 seconds
2023-04-30T15:22:49.194599Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-04-30 15:22:54.040 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 7.65 seconds
2023-04-30T15:22:59.210490Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-04-30 15:23:01.699 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 2.74 seconds
2023-04-30 15:23:04.442 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 7.90 seconds
2023-04-30T15:23:09.226492Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-04-30 15:23:12.356 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 4.10 seconds
2023-04-30 15:23:16.460 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 3.25 seconds
2023-04-30T15:23:19.238026Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-04-30 15:23:19.718 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 3.76 seconds
2023-04-30 15:23:23.479 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 0.70 seconds
2023-04-30 15:23:24.182 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 3.74 seconds
2023-04-30 15:23:27.929 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 3.04 seconds
2023-04-30T15:23:29.248026Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-04-30 15:23:30.976 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 6.88 seconds
2023-04-30 15:23:37.864 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 2.89 seconds
2023-04-30T15:23:39.259110Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-04-30 15:23:40.757 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 5.52 seconds
2023-04-30 15:23:46.287 | WARNING | utils:wait_for_inference_server:71 - Inference server not ready. Retrying in 7.90 seconds
2023-04-30T15:23:49.288384Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-04-30T15:23:51.887480Z ERROR text_generation_launcher: Shard 0 failed to start:
/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/bitsandbytes/cextension.py:127: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
We're not using custom kernels.
2023-04-30T15:23:51.887567Z INFO text_generation_launcher: Shutting down shards
The system running the container is an OpenStack Instance with 8 vCPUs and 32 GB vRAM running Ubuntu 22.04. I have a pile of vCPUs and vRAM, but sadly no GPU yet to run tests.
Before running the "docker compose up" I just set MODEL_CONFIG_NAME to OA_SFT_Pythia_12B_4 as env var.
This message baffles me: "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used." - shouldn't there be a last one of them? (I assume they fall out of the "huggingface/transformers" requirement)
Thanks in advance!
Hi,
Did you manage to make it work. I have the same situation.
No. I hoped for some feedback whether it is possible as in other solutions like fastchat (which does have an option for that). But after some testing with fastchat I learned that larger models (7B/13B) need a GPU to get a decent response time.
So I got my hands on an instance of a cloud gpu service provider with reasonable prices (like Lambda Labs) and tested there (not open assistant yet). Now that I know what the memory usage of some models are (7B ~8GB, 13B ~28 GB) I'm thinking of an effordable desktop GPU with 10/12 GB to play with smaller models (while dreaming of an NVIDIA A100 ;-).
https://cloud-gpus.com/ - for an overview of providers