Jason Dai
Jason Dai
> I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default Does `model.chat` use BigDL?
Looks like `transformers` version mismatch
Can we add the model and additional layers to `Sequential`? Or something like ```python in = Input(...) out = model(in) ... ```
> I could add more examples using the functional api in the doc. I usually use stackoverflow, because there are a lot of spark community and it will help for...
> Hi, guys. I notice that BigDL utilizes BigDL nano and ggml to accelerate int8/int4 computations. I wonder how to invoke these APIs in LLMs like LLAMA. Specifically, I want...
> @emartinezs44 Bad news, I tried to upgrade dllib to scala 2.13, but only one of dllib's dependency xgboost4j didn't support scala 2.13. [dmlc/xgboost#6596](https://github.com/dmlc/xgboost/issues/6596) Maybe we can release an experimental...
See https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF and https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF
Currently GPU is not supported.
Add attention sink as an example, instead of part of bigdl at this moment.
@kupix - Many thanks for this great write-up; we'll definitely look into the issues you ran into and report back.