jsoto-gladia
jsoto-gladia
Issue Description: NVIDIA Triton Server currently implements a graceful shutdown mechanism that is triggered only when there are no inflight inferences. However, it does not consider ongoing HTTP connections, which...
**Issue Description:** During a graceful shutdown of Triton Server, we've observed the following behavior: - Triton Server is hosting both Model A and Model B. - Model B can make...
add gladia live transcription service
when running ~/code/FullSubNet/recipes/dns_interspeech_2020$ python inference.py -C fullsubnet/inference.toml -M /home/jsoto/code/FullSubNet/recipes/dns_interspeech_2020/fullsubnet/cum_fullsubnet_best_model_218epochs.tar -O out I get File "/home/jsoto/anaconda3/envs/FullSubNet/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 878, in forward result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: [enforce fail at...
### System Info - `transformers` version: 4.44.1 - Platform: Linux-6.5.0-45-generic-x86_64-with-glibc2.35 - Python version: 3.10.6 - Huggingface_hub version: 0.24.6 - Safetensors version: 0.4.4 - Accelerate version: not installed - Accelerate config:...
**Bug Description** The ONNX CUDA session is not working in the Python backend. When attempting to run inference using the ONNX model with CUDAExecutionProvider, the session fails to initialize or...