delphiRo
delphiRo
## 🐛 Bug Segfault after last update. Before the update the gemma3 model works fine ## To Reproduce ROCR_VISIBLE_DEVICES=2 python -m mlc_llm serve HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC --port 8081 --overrides "tensor_parallel_shards=1;max_total_seq_length=2768;gpu_memory_utilization=0.92;" --mode server...
## 🚀 Feature Need to recent and actual support of Instinct AMD cards Increase the total serving actual capacity I need to use the latest rocm for gfx906
## 🐛 Bug ## To Reproduce I have a very strange situation with MLC LLM. Tested on qwen3-14b_q4f16 When I increase the input token length request up to 1500-2500tokens then...