Jiao Wang

Results 20 issues of Jiao Wang

Add bf16 test to llm benchmark

Fix low memory generation example issue in transformers 4.36. Related issue: https://github.com/analytics-zoo/nano/issues/1157 Can support all transformers versions of 4.31+. Test it under transformers 4.31 and 4.36.

Update tests for transformers 4.36.2 Related issue: https://github.com/analytics-zoo/nano/issues/1289 Move mistral related test to main tests as mistral is working under transformers 4.36. Removed 4.34 tests

Tests for transformers 4.36

Reorganize the Speculative Decoding example directory. Make example/CPU/Speculative-Decoding with Self-Speculation and Eagle folders Put ipex-llm self speculative example to Self-Speculation directory. Put Eagle examples to Eagle folder

In load_low_bit(), we'll check whether model.device.type in ('cpu', 'meta'). Since some models do not have 'device' attribute, there would get error when access model.device.type. Add `hasattr(model, 'device')` before access model.device.

Update EAGLE README on CPU with downgrading setuptools version. Since intel pytorch is not compatible with setuptools 70.0.0+

Update Eagle example to Eagle2+ipex-llm integration. With EAGLE-2+IPEX-LLM, the inference speed on Arc with fp16 can increase.

I'm try to use torch.distributed.launch to launch multiple node training with oneccl. On each node, I install oneccl, and source $oneccl_bindings_for_pytorch_path/env/setvars.sh The command on 1st node is: CCL_WORKER_COUNT=1 python -m...

stablelm-zephyr-3b on NPU example.