Yishuo Wang comments

Results 27 comments of


                                            Yishuo Wang

openvino and intel-tensorflow>2.7 are incompatible

> @MeouSker77 maybe we could add an issue to openvino team jira as well? (will send link offline) There is already an issue about this in openvino team jira

openvino and intel-tensorflow>2.7 are incompatible

> Do you have a ticket link for this? send the link offline

Nano: Setting LD_PRELOAD ERROR during initialization under dev environment

Nano's init script will try to find `libtcmalloc.so` in two location: `${NANO_DIR}/libs/libtcmalloc.so` and `${LIB_DIR}/libtcmalloc.so`. - `${NANO_DIR}` is the directory where you install nano, in your case, it's `/home/yuwen/BigDL/python/nano/src/bigdl/nano` - `${LIB_DIR}`...

使用算法count_if()的并行版本来统计vector中偶数的数量永远都不值得，即使有1,000,000,000个元素

我的理解是，从理论上讲vector可以直接把首尾迭代器相减算出元素数量，然后在实际开始计算之前就可以指定每个线程计算哪一部分元素，这样就只有最后求总和的时候需要把各个线程的结果加起来，其他时候各个线程之间完全不需要通信和同步，所以并行比串行快是有可能的。但如果是非随机访问迭代器，比如链表的迭代器，这时候无法事先知道总共有多少元素，就只能一个个遍历所有元素然后动态分给不同线程来计算，这时候不同线程之间就需要用原子操作来进行同步以确保不同线程不会重复计算同一个元素，如果每前进一个元素都要进行一次原子操作的话，那么原子操作的开销远大于判定一个数是不是 2 的倍数的开销，所以这种情况下不管有多少个元素，并行肯定比串行慢。

A better `packNibbles` and `mul_sum_i8_pairs_float` implementation using AVX512

> With an AVX512 machine, you may want to look into using `_mm256_dpbssd_epi32` in `mul_sum_i8_pairs_float`, that could give another speed boost. (Preprocessor condition: `#if __AVXVNNIINT8__`) Thank you very much for...

Qwen issues with >2K input token size on MTL

I have tried origin fp16 model without quantization or other optimization with this input, it also repeats outputs, so I think it's probably an issue with the model itself.

Qwen issues with >2K input token size on MTL

tried latest qwen 1.5 7b: ```python # -- coding:utf-8 -- import torch import intel_extension_for_pytorch as ipex from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel from transformers import AutoTokenizer model_path = "" model =...

Qwen issues with >2K input token size on MTL

> I encountered the same Native API failed problem when input token size > 2000. Native API failed -999 or -5 means out of memory in most cases, we are...

inference with windows issue

This issue is caused by the lack of Visual Studio, and it has been solved. As for the wrong output of whisper-base, could you share your code to run whisper-base...

Starting from bigdl-llm[xpu]==2.5.0b20231227, the 32-32 output of gguf models on Arc770 contains <unk>

**This bug is caused by using XMX kernel in a new thread**, it won't happen if running model in current thread. And I think its root cause is a bug...