bonito icon indicating copy to clipboard operation
bonito copied to clipboard

Model output shape does not make sense.

Open dberma15 opened this issue 2 years ago • 3 comments

I'm trying to make predictions on new data, but the output of my model does not make any sense:

If I have a dummy model and data:

from bonito.util import load_symbol
import toml
import numpy as np
import torch

config_file= 'config/[email protected]'
configs = toml.load(config_file)
model = load_symbol(configs, "Model")(configs)

#inputs = np.load("inputs.npy")
inputs = torch.rand((50, 1, 5000))

output = model(inputs)

output.shape is torch.Size((1000, 50, 5120). The third dimension should 5, matching the label size for [email protected] and I'm not sure what is wrong.

dberma15 avatar Jul 06 '22 04:07 dberma15

Hi, as far as I understand it, the model doesn't return sequence, but scores that need to be decoded. This should work:

scores = model(inputs)
seqs = model.decode_batch(scores)

Note, you may need to put the inputs on the same device as the model ie. cuda:0

lpryszcz avatar Jul 06 '22 08:07 lpryszcz

@lpryszcz is correct - ctc-crf models (v3+) don't output a probability distribution over the alphabet.

iiSeymour avatar Jul 08 '22 12:07 iiSeymour

More details: https://github.com/nanoporetech/bonito/issues/101#issuecomment-754611097

chAwater avatar Jan 29 '23 07:01 chAwater