leiwen83

Results 39 comments of leiwen83

> @leiwen83: I solved this error by adding the following line of code above line 115 in `huggingface_loader.py`: `preshard_funcs = {}` Yep, with this, "argument of type 'NoneType' is not...

> @leiwen83 Fast tokenizer is needed to get high performance. There are many new model don't support fast tokenizer, so for those model, and for those model finetuned without tokenizer.json,...

> @leiwen83 What gpu card and setting did you use for testing? When using a slow tokenizer, LightLLM should not be slower than VLLM either. I am testing llama7B with...

it could be fixed by below change: ``` diff --git a/lightllm/common/basemodel/layer_weights/hf_load_utils.py b/lightllm/common/basemodel/layer_weights/hf_load_utils.py index 30be3a5..d9ef3ad 100644 --- a/lightllm/common/basemodel/layer_weights/hf_load_utils.py +++ b/lightllm/common/basemodel/layer_weights/hf_load_utils.py @@ -15,7 +15,8 @@ def load_hf_weights(data_type, weight_dir, pre_post_layer=None, transformer_laye candidate_files =...

我这边的主要述求是能够和达到官方一致的量化效果。 请问目前GPTQ和AWQ官方发布版本校准dataset是使用的是哪一个?用户可以本地重现量化结果吗?。。

Currently this imp still has two kernel dealing with shrink and expand separately. I wonder whether we could merge them into one? So that triton could do the pipeline autotune...

Sounds very interesting! For the second usage, I have a question ``` The user want to query a fixed set of long documents (examples: software manual, internal documents, etc). In...

Hi @nsurbay , Thanks for your reply! Here is the scripts and function I am try to play with: There are two functions, saying FunA and FunB, which are interpreted...

I notice there is note in the git's readme: ``` A current limitation is that QBDI doesn't handle signals, multithreading (it doesn't deal with new threads creation) and C++ exception...