transformerlab-app
transformerlab-app copied to clipboard
MLX Worker dies without freeing port after sitting idle for sometime
Restarting the model doesn't work because the port is still being used but the model process has died.
2025-06-04 19:14:24 | INFO | stdout | INFO: ::1:58547 - "POST /worker_get_status HTTP/1.1" 200 OK
2025-06-04 19:14:40 | INFO | model_worker | Send heart beat. Models: ['Llama-3.3-70B-Instruct-4bit']. Semaphore: Semaphore(value=1024, locked=False). call_ct: 5. worker_id: 190268d8.
2025-06-04 21:29:33 | INFO | model_worker | Loading the model ['Llama-3.3-70B-Instruct-4bit'] on worker984dfc07, worker type: MLX worker...
2025-06-04 21:29:33 | ERROR | stderr |
Fetching 13 files: 0%| | 0/13 [00:00<?, ?it/s]
2025-06-04 21:29:33 | ERROR | stderr |
Fetching 13 files: 100%|██████████| 13/13 [00:00<00:00, 25989.49it/s]
2025-06-04 21:29:33 | ERROR | stderr |
2025-06-04 21:29:35 | INFO | stdout | Context length: 1048576
2025-06-04 21:29:35 | INFO | model_worker | Register to controller
2025-06-04 21:29:35 | ERROR | stderr | INFO: Started server process [28904]
2025-06-04 21:29:35 | ERROR | stderr | INFO: Waiting for application startup.
2025-06-04 21:29:35 | ERROR | stderr | INFO: Application startup complete.
2025-06-04 21:29:35 | ERROR | stderr | ERROR: [Errno 48] error while attempting to bind on address ('127.0.0.1', 21002): address already in use
2025-06-04 21:29:35 | ERROR | stderr | INFO: Waiting for application shutdown.
2025-06-04 21:29:35 | INFO | stdout | Cleaning up...
2025-06-04 21:29:35 | ERROR | stderr | INFO: Application shutdown complete.
2025-06-04 21:29:47 | ERROR | stderr | --- Logging error ---
Reported by Discord user florin
Tried this out, the only thing which happened was that the worker died for me automatically after an hour. We need to disable the timeout and also look at any other timeout affecting this