bonito icon indicating copy to clipboard operation
bonito copied to clipboard

Question about training dataset for pre-trained model

Open Howard-Liang opened this issue 2 years ago • 6 comments

Hi, I'm trying to reproduce the pre-trained bonito model. However, with the same model architecture and the dataset provided by using: bonito download --training, my models seem to converge with accuracy lower approximately 0.5~0.8% than the pre-trained model. I'm wondering was the pre-trained model trained only with the provided dataset? Thank you.

Howard-Liang avatar Mar 09 '22 07:03 Howard-Liang

According to: https://github.com/nanoporetech/bonito/issues/69 (end of the issue)

The pre-trained dataset is trained on other data.

marcpaga avatar Mar 10 '22 09:03 marcpaga

Isn't the dataset now consisting of 1221470 reads? Thanks.

Howard-Liang avatar Mar 11 '22 13:03 Howard-Liang

Ah, then they updated the dataset. My answer is invalid then.

marcpaga avatar Mar 11 '22 14:03 marcpaga

Where is stored the reads data after running bonito download --training ? I was expecting a .fast5 as a result but I just got some .npy files

danwarrior avatar May 11 '22 01:05 danwarrior

@danwarrior see https://github.com/nanoporetech/bonito/issues/4#issuecomment-562876731

iiSeymour avatar May 11 '22 12:05 iiSeymour

@Howard-Liang no, the available training set does not fully represent that used for the pre-trained models.

iiSeymour avatar May 11 '22 12:05 iiSeymour