Neo Zhang Jianyu
Neo Zhang Jianyu
# Existing Sample Changes ## Description 1. Update two INC quick start examples README.md with new template. 2. Fix the code error to adapt to new API. ## External Dependencies...
# Existing Sample Changes ## Description Update two INC quick start examples README.md with new template. Fix the code error to adapt to new API. Fixes Issue# ## External Dependencies...
Support get the total running time when not build caffe with CAFFE_PER_LAYER_TIMINGS := 1 in train mode. Due to the issue https://github.com/intel/caffe/issues/217 is fixed. CAFFE_PER_LAYER_TIMINGS := 1 can't be used...
I setup the demo based on ChatQnA (TGI) on Xeon (GNR). Try RAG by the UI. After upload the PDF file (2-5M), I search a question. It will take 10-15s....
1. When I test LLM backend service: ``` curl http://${host_ip}:9009/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ -H 'Content-Type: application/json' ``` In the first startup,...
Optimize MUL_MAT Q4_0 on Intel GPU. - Change the number of threads of kernel function. - Reorder the Q4 block to separate quantized weights and dequantize scaler. execute to reorder...
Ollama had supported by the PR https://github.com/ollama/ollama/pull/2458 merged to support Intel GPU. But the function disappears now. I see there are several issues and opened PRs for Intel GPU. But...