sparseml
sparseml copied to clipboard
QAT only training
Describe the bug I prepare a QAT only for training a roberta-large model. The output of the PyTorch saved model is not returning the values correctly. Although, I convert the model to ONNX and it works fine, but I also need the PyTorch model to work properly. How can I do this?
Expected behavior QAT only PyTorch model also works correctly. Environment Include all relevant environment information:
- OS : 18.04
- Python version : 3.8
- SparseML version or commit hash [e.g. 0.1.0,
f7245c8
]: 1.0.1 - ML framework version(s) : torch 1.9.0
- Other Python package versions: ORT 1.12.0
- Other relevant environment information [e.g. hardware, CUDA version]: CUDA 11.5
I also get this error at the end of my training:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument scale in method wrapper___fake_quantize_per_tensor_affine_cachemask_tensor_qparams)
Is there any update for solving the issue?
Hi @farzanehnakhaee70 could you send more information on what is incorrect about the saved pytorch model? Where is the unexpected behavior coming from?
additionally could you provide a stack trace for the second issue? Upgrading sparseml version may help as well.
Hi @farzanehnakhaee70 Since some time has passed, we'll go ahead and close this issue. But if you're able to supply the requested info, please do! Thank you. Jeannie / Neural Magic