Jiao Wang issues

Results 20 issues of


                                            Jiao Wang

Add bf16 test to llm benchmark

Fix low memory generation example issue in transformers 4.36

Fix low memory generation example issue in transformers 4.36. Related issue: https://github.com/analytics-zoo/nano/issues/1157 Can support all transformers versions of 4.31+. Test it under transformers 4.31 and 4.36.

Update tests for transformers 4.36

Update tests for transformers 4.36.2 Related issue: https://github.com/analytics-zoo/nano/issues/1289 Move mistral related test to main tests as mistral is working under transformers 4.36. Removed 4.34 tests

Tests for transformers 4.36

Reconstruct Speculative Decoding example directory

Reorganize the Speculative Decoding example directory. Make example/CPU/Speculative-Decoding with Self-Speculation and Eagle folders Put ipex-llm self speculative example to Self-Speculation directory. Put Eagle examples to Eagle folder

Add check before access model.device in load_low_bit

In load_low_bit(), we'll check whether model.device.type in ('cpu', 'meta'). Since some models do not have 'device' attribute, there would get error when access model.device.type. Add `hasattr(model, 'device')` before access model.device.

Fix EAGLE README to remove intel pytorch on CPU

Update EAGLE README on CPU with downgrading setuptools version. Since intel pytorch is not compatible with setuptools 70.0.0+

Update Eagle example to Eagle2+ipex-llm integration

Update Eagle example to Eagle2+ipex-llm integration. With EAGLE-2+IPEX-LLM, the inference speed on Arc with fp16 can increase.

How to use torch.distributed.launch to run multiple node training with oneccl

I'm try to use torch.distributed.launch to launch multiple node training with oneccl. On each node, I install oneccl, and source $oneccl_bindings_for_pytorch_path/env/setvars.sh The command on 1st node is: CCL_WORKER_COUNT=1 python -m...

stablelm-zephyr-3b on NPU example

stablelm-zephyr-3b on NPU example.