chunk_lengths is missing

Open DelphIONe opened this issue 3 years ago • 0 comments

I have ~33 000 mapped short reads (~200pb) from Nanopore RNA direct run and I try to train a model to rebasecall and improve the mapping (reduce the number of mismatch). Reference is 300 sequences of approximately 100 pb. I have followed the step with Taiyaki first (prepare_mapped_reads step is ok, I have checked output fast5 file structure). I have launched bonito convert with --chunsize 10000 and --seed 6 (minimap2 doesn't perform a good aligment, but bwa mem -k 6 -W 13 is good) but I have only 3 .npy files (instead of 4)

np.load('bonito_convert/references.npy').shape (29386, 110) np.load('bonito_convert/references.npy').dtype dtype('int16') np.load('bonito_convert/chunks.npy').dtype dtype('float32') np.load('bonito_convert/chunks.npy').shape (29386, 10000) np.load('bonito_convert/reference_lengths.npy').dtype dtype('uint16') np.load('bonito_convert/reference_lengths.npy').shape (29386,)

chunk_lengths.npy is missing, is it important for the following ? Despite all, I tested bonito train but I'm not sure of my command : bonito train /home/xxx/bonito_convert/ /home/xxx/bonito_train/ bonito_train is my output directory and bonito_convert contains the .npy files (only 3). In the config.toml output I see : config = "/home/xxx/bonito/bonito/models/configs/[email protected]"

and I work with RNA so I suspect some bad results. Moreover my training.csv is : time,duration,epoch,train_loss,validation_loss,validation_mean,validation_median 2022-10-26 19:02:00.481032,545,1,inf,inf,0.0,0.0 2022-10-26 19:10:59.383600,526,2,inf,inf,0.0,0.0 2022-10-26 19:19:57.946125,525,3,inf,inf,0.0,0.0 2022-10-26 19:28:56.500202,525,4,inf,inf,0.0,0.0 2022-10-26 19:37:55.883142,526,5,inf,inf,0.0,0.0

If someone could help me I would be really happy. Any tips ?

Oct 27 '22 16:10 DelphIONe