elfisworking issues

Results 7 issues of


                                            elfisworking

Add precision control for search function by adding round_decimal parameter for Java SDK

python sdk has supported precision control control when search. That feature should be added to java sdk.

feature

fix print function error in python3

修复了print函数在python3下无法使用的问题

VLLM 用AsyncLLMEngine推理结果报错

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an...

i try to use QAT to quantize qwen2 1.5B model The error raise from function `training.load_from_full_model_state_dict( model, model_state_dict, self._device, self._is_rank_zero, strict=True )` from recipes/qat_distributed Then i find error caused by...

bug

high-priority

torchtune generate function error when model used Int4WeightOnlyQATQuantizer

today, i try to use Int4WeightOnlyQATQuantizer to quantize llama3-8b when i use model generate function, i get below error: ``` Running InferenceRecipe with resolved config: chat_format: null checkpointer: _component_: torchtune.training.FullModelTorchTuneCheckpointer...

bug

inference

torchtune quantization has different model output comparing with document

I'm using torchtune for model quantization with QAT. Currently, I am learning based on https://pytorch.org/torchtune/main/tutorials/qat_finetune.html, but the results of the prepared_model I printed are different from those in the link....

Why is the inference speed of the quantized model using QAT so slow?

i get a quantized model using torchtune package The test log show me: INFO:torchtune.utils._logging:Time for inference: 66.56 sec total, 4.51 tokens/sec 4.51 tokens/sec is even lower than that of the...

qat

performance