bonito icon indicating copy to clipboard operation
bonito copied to clipboard

ONNX export

Open rajeevsrao opened this issue 5 years ago • 1 comments

Do you have an implementation that can be exported to ONNX via torch.onnx.export? I'm seeing nvcc compilation errors in the torch JIT exporter when I run the following script:

import toml
import torch
import sys
from bonito.crf.model import Model
def main():
    model_path = sys.argv[1]
    model_name = model_path.split('/')[-1][:-5]
    print(sys.argv[1])
    model = Model(toml.load(model_path))
    model.cuda()
    #model.eval()
    dummy_input = torch.randn(4, 1, 2280, device='cuda')
    output = model(dummy_input)
    print("Output: {} {}".format(output.shape, output))
    export_path = sys.argv[2] + "/bonito_" + model_name + ".onnx"
    torch.onnx.export(model, dummy_input, export_path, verbose=True, opset_version=int(sys.argv[3]))
    print("Total parameters in model", sum(p.numel() for p in model.parameters()))
if __name__ == "__main__":
    main()

Command: python export.py bonito/models/dna_r9.4.1/config.toml output 12

Platform details:

  • ONT-bonito v0.3.2
  • NVidia Pytorch-20.11-py3 container, and V100 GPU
  • Setup based on https://github.com/nanoporetech/bonito/tree/v0.3.2#developer-quickstart

Error log:

/opt/conda/lib/python3.6/site-packages/seqdist/sparse.py:118: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert idx.shape == (C, NZ)
/opt/conda/lib/python3.6/site-packages/torch/tensor.py:467: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  'incorrect results).', category=RuntimeWarning)
/opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:90: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 516, in compile
    nvrtc.compileProgram(self.ptr, options)
  File "cupy_backends/cuda/libs/nvrtc.pyx", line 108, in cupy_backends.cuda.libs.nvrtc.compileProgram
  File "cupy_backends/cuda/libs/nvrtc.pyx", line 120, in cupy_backends.cuda.libs.nvrtc.compileProgram
  File "cupy_backends/cuda/libs/nvrtc.pyx", line 58, in cupy_backends.cuda.libs.nvrtc.check_status
cupy_backends.cuda.libs.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)

rajeevsrao avatar Dec 18 '20 21:12 rajeevsrao

Hey @rajeevsrao

The problem with the export is in the last layer GlobalNorm which is implemented with Cupy here.

$ bonito view bonito/models/dna_r9.4.1/config.toml
Model(
  (encoder): Sequential(
    (0): Conv1d(1, 4, kernel_size=(5,), stride=(1,), padding=(2,))
    (1): Swish()
    (2): Conv1d(4, 16, kernel_size=(5,), stride=(1,), padding=(2,))
    (3): Swish()
    (4): Conv1d(16, 768, kernel_size=(19,), stride=(5,), padding=(9,))
    (5): Swish()
    (6): Permute()
    (7): RNNWrapper(
      (rnn): LSTM(768, 768)
    )
    (8): RNNWrapper(
      (rnn): LSTM(768, 768)
    )
    (9): RNNWrapper(
      (rnn): LSTM(768, 768)
    )
    (10): RNNWrapper(
      (rnn): LSTM(768, 768)
    )
    (11): RNNWrapper(
      (rnn): LSTM(768, 768)
    )
    (12): Linear(in_features=768, out_features=5120, bias=True)
    (13): Tanh()
    (14): Scale()
  )
  (global_norm): GlobalNorm()
)
Total parameters in model 27795560

The main issue seems to be error: identifier "tensor" is undefined which cupy is handling but the torch exporter isn't.

It's not immediately obvious what the best solution is but a short term fix to get a successful ONNX export might be replace the GlobalNorm layer with a PyTorch implementation.

iiSeymour avatar Dec 18 '20 22:12 iiSeymour