kaixuanliu
kaixuanliu
@OlivierDehaene , Hi, can you help review? Thx
@Narsil @OlivierDehaene , Hi I have updated this PR with latest code base. Pls help review.
@regisss @Narsil pls help review
Steps to re-produce on CPU: 1. build the docker container `docker build --build-arg PLATFORM="cpu" -f Dockerfile-intel -t tei_cpu .` 2. start backend: ``` model='jinaai/jina-embeddings-v2-base-code' volume=$PWD/data docker run -p 8080:80 -v...
I have not met other models with this unexpected behavior. But these three I listed above are in TEI README part. Our customers are asking support for these 3 models:...
Will change to another safer manner to support these 3 models
@Narsil Hi, I validated this PR, on CPU and XPU, it works well. But on HPU, I cannot start the backend, and I did a check by `pip list`, it...
change deafult `use_cache` param to `True`, to align with the former implementation and make CI pass
In [#1292](https://github.com/huggingface/optimum-habana/pull/1292), `args.use_kv_cache` was set to `False` by default, which will greatly slow down the performance, and cause CI failed.
change deafult `use_cache` param to `True`, to align with the former implementation and make CI pass
@regisss @libinta Pls help review
Cmd line: `python3 run_pipeline.py --model_name_or_path google/paligemma-3b-mix-224 --use_hpu_graphs --bf16`, this PR should work with [#1279](https://github.com/huggingface/optimum-habana/pull/1279)