medaka icon indicating copy to clipboard operation
medaka copied to clipboard

Which model?

Open tnn111 opened this issue 1 year ago • 1 comments

I sequenced one sample using 2 R9.4.1 flow cells along with 2 R10.4 flow cells. Then I used all of the fastq files for an assembly using flye.

Is it possible to run medaka to error correct the whole thing? If so, what model should I use? Attached is headers from the two types of fastq files:

@d7af3dcc-38f8-453e-8624-8f4f4a613314 runid=a6e19b866fc18cbf71e0125d1cefadc4576e48be sampleid=no_sample read=12671 ch=469 start_time=2022-09-05T02:31:28Z model_version_id=2021-05-17_dna_r9.4.1_minion_768_2f1c8637

@a0ec37d5-cf70-46c9-b259-c7a01c347e39 runid=307ec030150e0f1aad2d03701e893fe1faf0fe26 read=14 ch=2049 start_time=2022-09-09T18:56:26.495449+00:00 flow_cell_id=PAM35393 protocol_group_id=X0217 sample_id=Station32b parent_read_id=a0ec37d5-cf70-46c9-b259-c7a01c347e39 basecall_model_version_id=2021-09-03_dna_r10.4_minion_promethion_384_6b8e75c7

Thanks!

tnn111 avatar Sep 16 '22 23:09 tnn111

We used to experimentally support use of multiple datatypes in medaka, but we longer have the models trained to do this.

What depth of sequencing do you have for the two flowcell types?

cjw85 avatar Sep 27 '22 12:09 cjw85

I have a similar question: i have two batch of sequences and i intend to co-assemble the sequences. They used the same flow cell but different base caller versions: one is 5.0.16+b9fcd7b using sup mode, the other is 5.0.11+2b6dbff using the hat mode. should I choose the version with "sup" or "hac"? should I choose g5015 or g507? there is a model version called "r104_e81_sup_g5015", what does the e81 mean? If the model should not be used for different modes, can i assemble individually first, then use medaka to correct the reads, then assemble them again? Thank you!

wn835166087 avatar Oct 30 '22 19:10 wn835166087