sparseml icon indicating copy to clipboard operation
sparseml copied to clipboard

QAT only training

Open farzanehnakhaee70 opened this issue 2 years ago • 1 comments

Describe the bug I prepare a QAT only for training a roberta-large model. The output of the PyTorch saved model is not returning the values correctly. Although, I convert the model to ONNX and it works fine, but I also need the PyTorch model to work properly. How can I do this?

Expected behavior QAT only PyTorch model also works correctly. Environment Include all relevant environment information:

  1. OS : 18.04
  2. Python version : 3.8
  3. SparseML version or commit hash [e.g. 0.1.0, f7245c8]: 1.0.1
  4. ML framework version(s) : torch 1.9.0
  5. Other Python package versions: ORT 1.12.0
  6. Other relevant environment information [e.g. hardware, CUDA version]: CUDA 11.5

farzanehnakhaee70 avatar Aug 21 '22 06:08 farzanehnakhaee70

I also get this error at the end of my training:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument scale in method wrapper___fake_quantize_per_tensor_affine_cachemask_tensor_qparams)

Is there any update for solving the issue?

farzanehnakhaee70 avatar Aug 27 '22 06:08 farzanehnakhaee70

Hi @farzanehnakhaee70 could you send more information on what is incorrect about the saved pytorch model? Where is the unexpected behavior coming from?

additionally could you provide a stack trace for the second issue? Upgrading sparseml version may help as well.

bfineran avatar Feb 15 '23 20:02 bfineran

Hi @farzanehnakhaee70 Since some time has passed, we'll go ahead and close this issue. But if you're able to supply the requested info, please do! Thank you. Jeannie / Neural Magic

jeanniefinks avatar Mar 09 '23 20:03 jeanniefinks