Dongjie Shi

Results 57 comments of Dongjie Shi

@dimakuv @boryspoplawski any insights?

> @glorysdj Could you check if [gramineproject/gramine#109](https://github.com/gramineproject/gramine/pull/109) fixes this issue? yes, we will try this

> @glorysdj Could you check if [gramineproject/gramine#109](https://github.com/gramineproject/gramine/pull/109) fixes this issue? have tried with latest gramine, but encountered another issues, when run a very simple java program. will try to summarize...

request device '/job:localhost/replica:0/task:0/device:CPU:0' but All available devices [ /job:worker/replica:0/task:0/device:CPU:0, /job:worker/replica:0/task:0/device:XLA_CPU:0].

> Hi @plusbang, we can successfully run the inference with deep speed for the neural chat on Flex 140. Thanks for your support. However, the customer is also interested in...

> I test Baichuan 13B Chat on an spr machine, it turned out CPU utilization per core could reach to almost 80%. The test script is sourced from [here](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan). please...

Could you please try [GPTCache](https://github.com/analytics-zoo/GPTCache) to enable LLM Cache?

do we need to release bigdl-core 2.0.1 which will be used by bigdl 2.0.1?

> For serving, I think we only need FastAPI/FastChat/vLLM, maybe also AutoTP support; I don't think we need langchain-chatchat and text-gen-webui OK, get it.

> Hello am asking here just to be sure because am about to buy 2 or 3 Intel ARC a770 Can I run phi3-medium-128k and llama3-70b with ollama docker image...