fxmarty comments

Results 316 comments of


                                            fxmarty

Loading ONNX models converted using `--large_model` flag

Actually, I have always used `onnx-tensorrt` along trtexec or ONNX Runtime, so not sure about the `TensorRTBackend` way. What are the bunch of things along `__MODEL_PROTO.onnx`? You could try ```python...

ONNX export of integer weights with large models

@costigt-dev @Giuseppe5 Brevitas seem to be using `Constant` for the int8 weights in ONNX, while PyTorch ONNX export / ORT quantizer use `Inititializer`. I'm not sure if this difference has...

ONNX export of integer weights with large models

Note: doing the export with ```python export_manager = StdQCDQONNXManager export_manager.change_weight_export(export_weight_q_node=True) with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=export_manager): ``` instead of simply ```python with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=StdQCDQONNXManager): ```` fixes the issue. But this is...

Extend SmoothQuant support (exclude nodes, fuse into layernorm)

@chensuyue @mengniwang95 @PenghuiCheng @xin3he happy to get a review on this one!

Extend SmoothQuant support (exclude nodes, fuse into layernorm)

Thank you @chensuyue, will have a look!

Extend SmoothQuant support (exclude nodes, fuse into layernorm)

@chensuyue Sorry I did not get time to fix it, I won't be able before the release unfortunately.

[Question] failed to call OrtRun(). error code = 1. When I try to load Xenova/pygmalion-350m

Let me know if you think the issue is related to the export.

Support TEI on AMD GPUs

Hi, there is some progress in https://github.com/huggingface/text-embeddings-inference/pull/293. Would you mind sharing which AMD GPUs you are using? Thank you!

[ROCm] TunableOp improvements

@jeffdaily I have what appears to be a memory leak using tunableop. Notably, I am not using `PYTORCH_TUNABLEOP_NUMERICAL_CHECK=1` https://github.com/pytorch/pytorch/pull/129281. Essentially, after running a few forward passes, the available memory with...

[ROCm] TunableOp improvements

Thanks a lot @jeffdaily