ragflow Xinference model

Describe your problem

I install xinference using Docker and install the model:llama-3, Model Format:ggufv2, Model Size:8, Quantization:Q4_K_M, N GPU Layer:-1, Replica: 1 from http://localhost:9997/ui/#/launch_model/llm

I want to add Xinference model into Ragflow. I set the following, and I got error. What are the correct setting to add Xinference model for the installed model above ?

*Model type chat

*Model UID llama-3

*Base url http://host.docker.internal:9997/v1

Error:

Fail to access model(llama-3).ERROR: Error code: 500 - {'detail': "[address=0.0.0.0:34191, pid=166] Model model_format='ggufv2' model_size_in_billions=8 quantizations=['Q2_K', 'Q3_K_L', 'Q3_K_M', 'Q3_K_S', 'Q4_0', 'Q4_1', 'Q4_K_M', 'Q4_K_S', 'Q5_0', 'Q5_1', 'Q5_K_M', 'Q5_K_S', 'Q6_K', 'Q8_0'] model_id='QuantFactory/Meta-Llama-3-8B-GGUF' model_file_name_template='Meta-Llama-3-8B.{quantization}.gguf' model_file_name_split_template=None quantization_parts=None model_hub='huggingface' model_uri=None model_revision=None is not for chat."}

May 02 '24 16:05 cielglan

Sorry. You could submit an issue to conference. They might meet this problem before. We have had no clue yet.

May 03 '24 00:05 KevinHuSh

If I can specify the Model UID correctly, Ragflow can recognize the model to register, I think.

May 03 '24 10:05 cielglan