kevinintel comments

Results 62 comments of


                                            kevinintel

trafficstars

Frontend(gradio) string output is not like streaming

Thanks for reporting it, we will check the issue

Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect.

It looks like load/save mismatch, can you try to use latest commit instead of g494a5712fa2 and set use_neural_speed=False?

Cannot run llama3 8b instruct: `AssertionError: Fail to convert pytorch model`

Thanks for reporting it, we will check the issue

Version conflict

Hi @kithogue , can you share your use case?

Version conflict

Close it first until the user provide details

Inference ONNX format model

We already support onnx, please refer to: https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/deprecated/docs/deploy_and_integration.md

Inference ONNX format model

if you don't have other questions, i will close this issue.

Inference ONNX format model

Hi bmtuan, can you use the example?

Deploying on virtual machines?

I will close this issue if you don't have concerns

feature request: support for TextIterationstreamer of HF

Thank to your feedback, we will support it.