500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application
500 Internal Server
After running compreface for several weeks, it just stops connecting. Admin node starts, core and api stays at "loading"
Desktop (please complete the following information):
- OS: [e.g. iOS] Debian 12
- Browser [e.g. chrome, safari] Chrome
- Version [e.g. 22] 1.20
Pastbin with logs: https://pastebin.com/H0FvXkeX
Run those commands and attach result to the ticket:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
52886e076898 exadel/compreface-core:1.2.0-arcface-r100-gpu "/opt/nvidia/nvidia_…" 26 minutes ago Up 59 seconds 3000/tcp compreface-core
5029f78c50c5 skrashevich/double-take:1.13.10 "/bin/bash ./entrypo…" 6 days ago Up 5 hours 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp Frigate-Doubletake
f8fcbeccb0a8 ghcr.io/blakeblackshear/frigate:stable "/init" 6 days ago Up 5 hours 0.0.0.0:1935->1935/tcp, :::1935->1935/tcp, 0.0.0.0:5000->5000/tcp, :::5000->5000/tcp, 0.0.0.0:8554-8555->8554-8555/tcp, :::8554-8555->8554-8555/tcp, 0.0.0.0:8555->8555/udp, :::8555->8555/udp Frigate
fb3398b4b31b exadel/compreface-fe:1.2.0 "/docker-entrypoint.…" 13 days ago Up 53 seconds 0.0.0.0:8001->80/tcp, :::8001->80/tcp compreface-ui
26cd65d24261 exadel/compreface-admin:1.2.0 "sh -c 'java $ADMIN_…" 13 days ago Up 56 seconds compreface-admin
1f23d5bd9a3a exadel/compreface-api:1.2.0 "sh -c 'java $API_JA…" 13 days ago Up 54 seconds compreface-api
9ebbe3d57068 exadel/compreface-postgres-db:1.2.0 "docker-entrypoint.s…" 13 days ago Up 52 seconds 5432/tcp compreface-postgres-db
7a933549b781 eclipse-mosquitto:latest "/docker-entrypoint.…" 13 days ago Up 5 hours 0.0.0.0:1883->1883/tcp, :::1883->1883/tcp, 0.0.0.0:9001->9001/tcp, :::9001->9001/tcp Frigate-Mqtt
e7e908560c54 portainer/portainer-ce:latest "/portainer" 13 days ago Up 5 hours 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp, 0.0.0.0:9443->9443/tcp, :::9443->9443/tcp, 9000/tcp portainer
docker-compose logs
I also see this exact issue. From compreface-core log:
compreface-core | Traceback (most recent call last): compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/symbol/symbol.py", line 1903, in simple_bind compreface-core | check_call(_LIB.MXExecutorSimpleBindEx(self.handle, compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/base.py", line 246, in check_call compreface-core | raise get_last_ffi_error() compreface-core | mxnet.base.MXNetError: Traceback (most recent call last): compreface-core | File "/work/mxnet/src/storage/storage.cc", line 97 compreface-core | CUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected compreface-core | compreface-core | During handling of the above exception, another exception occurred: compreface-core | compreface-core | Traceback (most recent call last): compreface-core | File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2447, in wsgi_app compreface-core | response = self.full_dispatch_request() compreface-core | File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1945, in full_dispatch_request compreface-core | self.try_trigger_before_first_request_functions() compreface-core | File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1993, in try_trigger_before_first_request_functions compreface-core | func() compreface-core | File "/app/ml/./src/_endpoints.py", line 52, in init_model compreface-core | detector( compreface-core | File "/app/ml/./src/services/facescan/plugins/mixins.py", line 46, in call compreface-core | faces = self._fetch_faces(img, det_prob_threshold) compreface-core | File "/app/ml/./src/services/facescan/plugins/mixins.py", line 53, in _fetch_faces compreface-core | boxes = self.find_faces(img, det_prob_threshold) compreface-core | File "/app/ml/./src/services/facescan/plugins/insightface/insightface.py", line 103, in find_faces compreface-core | model = self._detection_model compreface-core | File "/usr/local/lib/python3.8/dist-packages/cached_property.py", line 36, in get compreface-core | value = obj.dict[self.func.name] = self.func(obj) compreface-core | File "/app/ml/./src/services/facescan/plugins/insightface/insightface.py", line 80, in _detection_model compreface-core | model.prepare(ctx_id=self._CTX_ID, nms=self._NMS) compreface-core | File "/usr/local/lib/python3.8/dist-packages/insightface/app/face_analysis.py", line 32, in prepare compreface-core | self.det_model.prepare(ctx_id, nms) compreface-core | File "/usr/local/lib/python3.8/dist-packages/insightface/model_zoo/face_detection.py", line 217, in prepare compreface-core | model.bind(data_shapes=[('data', data_shape)]) compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/module/module.py", line 422, in bind compreface-core | self._exec_group = DataParallelExecutorGroup(self._symbol, self._context, compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 280, in init compreface-core | self.bind_exec(data_shapes, label_shapes, shared_group) compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 383, in bind_exec compreface-core | self.execs.append(self._bind_ith_exec(i, data_shapes_i, label_shapes_i, compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 675, in _bind_ith_exec compreface-core | executor = self.symbol.simple_bind(ctx=context, grad_req=self.grad_req, compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/symbol/symbol.py", line 1944, in simple_bind compreface-core | raise RuntimeError(error_msg) compreface-core | RuntimeError: simple_bind error. Arguments: compreface-core | data: (1, 3, 480, 640) compreface-core | Traceback (most recent call last): compreface-core | File "/work/mxnet/src/storage/storage.cc", line 97 compreface-core | CUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected compreface-core | {"severity": "WARNING", "message": "500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.", "request": {"method": "GET", "path": "/status", "filename": "", "api_key": "", "remote_addr": "172.18.0.4"}, "logger": "root", "module": "error_handling", "traceback": "Traceback (most recent call last):\n File "/usr/local/lib/python3.8/dist-packages/mxnet/symbol/symbol.py", line 1903, in simple_bind\n check_call(_LIB.MXExecutorSimpleBindEx(self.handle,\n File "/usr/local/lib/python3.8/dist-packages/mxnet/base.py", line 246, in check_call\n raise get_last_ffi_error()\nmxnet.base.MXNetError: Traceback (most recent call last):\n File "/work/mxnet/src/storage/storage.cc", line 97\nCUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2447, in wsgi_app\n response = self.full_dispatch_request()\n File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1945, in full_dispatch_request\n self.try_trigger_before_first_request_functions()\n File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1993, in try_trigger_before_first_request_functions\n func()\n File "/app/ml/./src/_endpoints.py", line 52, in init_model\n detector(\n File "/app/ml/./src/services/facescan/plugins/mixins.py", line 46, in call\n faces = self._fetch_faces(img, det_prob_threshold)\n File "/app/ml/./src/services/facescan/plugins/mixins.py", line 53, in _fetch_faces\n boxes = self.find_faces(img, det_prob_threshold)\n File "/app/ml/./src/services/facescan/plugins/insightface/insightface.py", line 103, in find_faces\n model = self._detection_model\n File "/usr/local/lib/python3.8/dist-packages/cached_property.py", line 36, in get\n value = obj.dict[self.func.name] = self.func(obj)\n File "/app/ml/./src/services/facescan/plugins/insightface/insightface.py", line 80, in _detection_model\n model.prepare(ctx_id=self._CTX_ID, nms=self._NMS)\n File "/usr/local/lib/python3.8/dist-packages/insightface/app/face_analysis.py", line 32, in prepare\n self.det_model.prepare(ctx_id, nms)\n File "/usr/local/lib/python3.8/dist-packages/insightface/model_zoo/face_detection.py", line 217, in prepare\n model.bind(data_shapes=[('data', data_shape)])\n File "/usr/local/lib/python3.8/dist-packages/mxnet/module/module.py", line 422, in bind\n self._exec_group = DataParallelExecutorGroup(self._symbol, self._context,\n File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 280, in init\n self.bind_exec(data_shapes, label_shapes, shared_group)\n File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 383, in bind_exec\n self.execs.append(self._bind_ith_exec(i, data_shapes_i, label_shapes_i,\n File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 675, in _bind_ith_exec\n executor = self.symbol.simple_bind(ctx=context, grad_req=self.grad_req,\n File "/usr/local/lib/python3.8/dist-packages/mxnet/symbol/symbol.py", line 1944, in simple_bind\n raise RuntimeError(error_msg)\nRuntimeError: simple_bind error. Arguments:\ndata: (1, 3, 480, 640)\nTraceback (most recent call last):\n File "/work/mxnet/src/storage/storage.cc", line 97\nCUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected\n", "build_version": "dev"}
The core-api log shows this exception:
com.exadel.frs.commonservice.sdk.faces.exception.FacesServiceException: Error during synchronization between servers: [500 INTERNAL SERVER ERROR] during [GET] to [http://compreface-core:3000/status] [FacesFeignClient#getStatus()]: [{"message":"500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."}
compreface-api | ]
compreface-api | at com.exadel.frs.commonservice.sdk.faces.service.FacesRestApiClient.getStatus(FacesRestApiClient.java:101)
compreface-api | at com.exadel.frs.commonservice.sdk.faces.service.FacesRestApiClient$$FastClassBySpringCGLIB$$517e8caf.invoke(