Neo Zhang Jianyu issues

Results 7 issues of


                                            Neo Zhang Jianyu

Update INC quick start examples

# Existing Sample Changes ## Description 1. Update two INC quick start examples README.md with new template. 2. Fix the code error to adapt to new API. ## External Dependencies...

Update INC quick start examples

# Existing Sample Changes ## Description Update two INC quick start examples README.md with new template. Fix the code error to adapt to new API. Fixes Issue# ## External Dependencies...

Update run_benchmark.py to get total running time

Support get the total running time when not build caffe with CAFFE_PER_LAYER_TIMINGS := 1 in train mode. Due to the issue https://github.com/intel/caffe/issues/217 is fixed. CAFFE_PER_LAYER_TIMINGS := 1 can't be used...

RAG is slow in ChatQnA demo on Xeon

I setup the demo based on ChatQnA (TGI) on Xeon (GNR). Try RAG by the UI. After upload the PDF file (2-5M), I search a question. It will take 10-15s....

aitce

Optimize the LLM backend service during download LLM

1. When I test LLM backend service: ``` curl http://${host_ip}:9009/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ -H 'Content-Type: application/json' ``` In the first startup,...

[SYCL] Optimize mul_mat for Q4_0 on Intel GPU

Optimize MUL_MAT Q4_0 on Intel GPU. - Change the number of threads of kernel function. - Reorder the Q4 block to separate quantized weights and dequantize scaler. execute to reorder...

documentation

examples

ggml

SYCL

[Feature] Support Intel GPUs

Ollama had supported by the PR https://github.com/ollama/ollama/pull/2458 merged to support Intel GPU. But the function disappears now. I see there are several issues and opened PRs for Intel GPU. But...

feature request