Jiao Wang
Jiao Wang
Add bf16 test to llm benchmark
Fix low memory generation example issue in transformers 4.36. Related issue: https://github.com/analytics-zoo/nano/issues/1157 Can support all transformers versions of 4.31+. Test it under transformers 4.31 and 4.36.
Update tests for transformers 4.36.2 Related issue: https://github.com/analytics-zoo/nano/issues/1289 Move mistral related test to main tests as mistral is working under transformers 4.36. Removed 4.34 tests
Tests for transformers 4.36
Reorganize the Speculative Decoding example directory. Make example/CPU/Speculative-Decoding with Self-Speculation and Eagle folders Put ipex-llm self speculative example to Self-Speculation directory. Put Eagle examples to Eagle folder
In load_low_bit(), we'll check whether model.device.type in ('cpu', 'meta'). Since some models do not have 'device' attribute, there would get error when access model.device.type. Add `hasattr(model, 'device')` before access model.device.
Update EAGLE README on CPU with downgrading setuptools version. Since intel pytorch is not compatible with setuptools 70.0.0+
Update Eagle example to Eagle2+ipex-llm integration. With EAGLE-2+IPEX-LLM, the inference speed on Arc with fp16 can increase.
I'm try to use torch.distributed.launch to launch multiple node training with oneccl. On each node, I install oneccl, and source $oneccl_bindings_for_pytorch_path/env/setvars.sh The command on 1st node is: CCL_WORKER_COUNT=1 python -m...
stablelm-zephyr-3b on NPU example.