bonito icon indicating copy to clipboard operation
bonito copied to clipboard

Trimming CTC data

Open mbhall88 opened this issue 3 years ago • 3 comments

Hi,

I am having some issues relating to demultiplexing. I have trained a custom model, it performs extremely well on the DNA of my species of interest, but falls over when it comes to demultiplexing. I am losing more than half of my data to the dreaded "none" bin.

In https://github.com/nanoporetech/bonito/issues/26#issuecomment-613401666 it was suggested that trimming the signal could improve this - and this makes a lot of sense. However, the example there assumes trimming in the process of chunkifying a HDF5 file from taiyaki.

I have the chunk data already (from basecalling with --save-ctc) and would like to trim this to achieve the same result as trimming the signal at the starts and the ends by some offset. (I basically want to get rid of signal that relates to the barcode.)

What I am struggling with is how best to do this as I don't know what each of the chunkify files is.

For example, the reference_lengths.npy file has shape (35691,), references.npy has shape (35691, 482), and chunks.npy has shape (35691, 4000). How do each of these files relate to each other?

Let's say I want to trim 100 signal samples from the start and end of each read, how would I do this? (I am open to suggestions for offset sizes - this was an arbitrary number).

mbhall88 avatar May 07 '22 08:05 mbhall88

Any development on this issue? Thanks in advance.

touala avatar Nov 29 '23 11:11 touala

Sadly, no @touala. I never got any response about this issue. I ended up having to abandon my project because of this issue. I tried many different ways to trim the data but couldn't fix this demultiplexing issue unfortunately.

mbhall88 avatar Nov 29 '23 22:11 mbhall88

Thanks for the response @mbhall88. I'm currently doing the demultiplexing with ONT model and then using my custom model to redo the basecalling... Not great but it seems ok. I'll revisit soon as I need to update all my workflow. Hopefully this got better since last time I tried.

touala avatar Jan 08 '24 06:01 touala