Baiju Meswani
Baiju Meswani
Hi @mlupetti. Sorry for the late reply. The trainer requires 4 files to perform training: 1. The checkpoint file. 2. The training onnx model. 3. The eval onnx model (optional...
The goal for this two-step process is to make deploying the training solution easy for on-device training scenarios where the expectation is that the users can generate the files in...
> FYI: #13215 perhaps, you could hold off on this? ok, will wait for #13215 to merge and update this PR accordingly.
The speed up looks good. Didn't imagine this speed up for elementwise ops. Would something like auto fusing elementwise ops handle this scenario as well, @pengwa? Also, does `QuickGelu` consistently...
/azp run On-Device Training Tests
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline,...
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed
Yes, as of now, `onnxruntime-training` (`ORTModule`) is not available on Windows. We do plan to change this in the near future. But I don't have a timeline to share with...
Hi there. I provided some suggestions here: https://github.com/microsoft/onnxruntime/issues/19464. The idea being, if the loss is difficult to express in onnxblock, you could try to create an onnx model from pytorch...
Looking at your model after doing shape inferencing on it, I see the concat node like so:  The concat node is trying to concat tensors of different types (int64...