fxmarty
fxmarty
Actually, I have always used `onnx-tensorrt` along trtexec or ONNX Runtime, so not sure about the `TensorRTBackend` way. What are the bunch of things along `__MODEL_PROTO.onnx`? You could try ```python...
@costigt-dev @Giuseppe5 Brevitas seem to be using `Constant` for the int8 weights in ONNX, while PyTorch ONNX export / ORT quantizer use `Inititializer`. I'm not sure if this difference has...
Note: doing the export with ```python export_manager = StdQCDQONNXManager export_manager.change_weight_export(export_weight_q_node=True) with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=export_manager): ``` instead of simply ```python with torch.no_grad(), brevitas_proxy_export_mode(quantized_model, export_manager=StdQCDQONNXManager): ```` fixes the issue. But this is...
@chensuyue @mengniwang95 @PenghuiCheng @xin3he happy to get a review on this one!
Thank you @chensuyue, will have a look!
@chensuyue Sorry I did not get time to fix it, I won't be able before the release unfortunately.
Let me know if you think the issue is related to the export.
Hi, there is some progress in https://github.com/huggingface/text-embeddings-inference/pull/293. Would you mind sharing which AMD GPUs you are using? Thank you!
@jeffdaily I have what appears to be a memory leak using tunableop. Notably, I am not using `PYTORCH_TUNABLEOP_NUMERICAL_CHECK=1` https://github.com/pytorch/pytorch/pull/129281. Essentially, after running a few forward passes, the available memory with...
Thanks a lot @jeffdaily