James
James
Screenshots attached.  
I have tried both latest and dev. They all have the same issue. All the other endpoints, including the ones that were not working before are functioning normally, i.e. list_chunks....
I have worked around this issue but the bug remains. The issue is with the 'stream' option which if set to True will cause the `retval 500`. For now I...
I am also facing issues with both VLLM and Llama-stack. VLLM seems to allocate too much KV memory and gets OOM errors from CUDA. Llama-stack seems to rely on `models/sku_list.py`...