Xinference model
Describe your problem
I install xinference using Docker and install the model:llama-3, Model Format:ggufv2, Model Size:8, Quantization:Q4_K_M, N GPU Layer:-1, Replica: 1 from http://localhost:9997/ui/#/launch_model/llm
I want to add Xinference model into Ragflow. I set the following, and I got error. What are the correct setting to add Xinference model for the installed model above ?
*Model type chat
*Model UID llama-3
*Base url http://host.docker.internal:9997/v1
Error:
Fail to access model(llama-3).ERROR: Error code: 500 - {'detail': "[address=0.0.0.0:34191, pid=166] Model model_format='ggufv2' model_size_in_billions=8 quantizations=['Q2_K', 'Q3_K_L', 'Q3_K_M', 'Q3_K_S', 'Q4_0', 'Q4_1', 'Q4_K_M', 'Q4_K_S', 'Q5_0', 'Q5_1', 'Q5_K_M', 'Q5_K_S', 'Q6_K', 'Q8_0'] model_id='QuantFactory/Meta-Llama-3-8B-GGUF' model_file_name_template='Meta-Llama-3-8B.{quantization}.gguf' model_file_name_split_template=None quantization_parts=None model_hub='huggingface' model_uri=None model_revision=None is not for chat."}
Sorry. You could submit an issue to conference. They might meet this problem before. We have had no clue yet.
If I can specify the Model UID correctly, Ragflow can recognize the model to register, I think.