jsoto-gladia issues

Results 6 issues of


                                            jsoto-gladia

Graceful Shutdown Condition Enhancement for NVIDIA Triton Server

Issue Description: NVIDIA Triton Server currently implements a graceful shutdown mechanism that is triggered only when there are no inflight inferences. However, it does not consider ongoing HTTP connections, which...

enhancement

Premature shutdown of model during graceful shutdown

**Issue Description:** During a graceful shutdown of Triton Server, we've observed the following behavior: - Triton Server is hosting both Model A and Model B. - Model B can make...

bug

enhancement

feat: add gladia live transcription service

add gladia live transcription service

DefaultCPUAllocator: can't allocate memory: you tried to allocate 178GB

when running ~/code/FullSubNet/recipes/dns_interspeech_2020$ python inference.py -C fullsubnet/inference.toml -M /home/jsoto/code/FullSubNet/recipes/dns_interspeech_2020/fullsubnet/cum_fullsubnet_best_model_218epochs.tar -O out I get File "/home/jsoto/anaconda3/envs/FullSubNet/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 878, in forward result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: [enforce fail at...

whisper & flash_attention_2 & reduce overhead results in error

### System Info - `transformers` version: 4.44.1 - Platform: Linux-6.5.0-45-generic-x86_64-with-glibc2.35 - Python version: 3.10.6 - Huggingface_hub version: 0.24.6 - Safetensors version: 0.4.4 - Accelerate version: not installed - Accelerate config:...

bug

ONNX CUDA session not working in python backend

**Bug Description** The ONNX CUDA session is not working in the Python backend. When attempting to run inference using the ONNX model with CUDAExecutionProvider, the session fails to initialize or...