liudong
liudong
Hi, I also encountered the same problem when deploying llama3.1-70B on two Mac Airs. Below is the log output when I execute DEBUG=2 python main.py, ########## is the information I...
The log on node1 is as follows Removing download task for Shard(model_id='mlx-community/Meta-Llama-3.1-70B-Instruct-4bit', start_layer=0, end_layer=39, n_layers=80): True "model-00001-of-00008.safetensors": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], "model-00002-of-00008.safetensors": [9, 10,...
> Hi, 嗨, > > Yes, I've run my app with a SINGLE Gunicorn worker using the latest version (0.4.0) with `mount_http`. Adding the `stateless` flag and passing it down...