racon icon indicating copy to clipboard operation
racon copied to clipboard

illumina data to polish

Open asdcid opened this issue 6 years ago • 10 comments

Hi, the manual said the latest version of Racon support illumina reads, but what is the short-read format Racon support? Should I put R1/R2 into one file? or use some specific parameter? Thanks, Raymond

asdcid avatar May 23 '18 04:05 asdcid

Hello Raymond, racon expects one file with any kind of reads (3rd gen or 2nd gen paired/single ends). Therefore, you should join your paired ends into one file but be careful that reads from a pair do not have the same name up to the first whitespace. If you need a helper script, please take a look at https://github.com/isovic/racon/issues/68#issuecomment-386223150.

Best regards, Robert

P.s. If you are polishing a large genome, please use the latest commit.

rvaser avatar May 23 '18 15:05 rvaser

Hello guys, Which mapper would be the recommended one for mapping illumina reads to a raw miniasm assembly? Using bowtie2 with default parameters (I know, not the best idea, but just a first test) I am getting roughly 20% mapping efficiency and only 30% horizontal coverage, so not great.

Any recommendations would be greatly appreaciated.

Kind regards,

jdmontenegro avatar Jun 25 '18 02:06 jdmontenegro

Hello, you can try minimap2 with -x sr option. By raw miniasm assembly you used PacBio/ONT reads right? If so, I would advise polishing with those reads first (if you have decent coverage).

Best regards, Robert

rvaser avatar Jun 26 '18 08:06 rvaser

Thank you rvaser, That is correct I have a raw assembly obtained from the Flye assembler using 30X coverage of PacBio reads. The raw assembly should have similar error rates as raw pacbio reads. I have done some initial polishing using the pacbio reads and now I think I can map the illumina reads. I read in Hen Li's minimap2 page that he does not recommend mapping short reads to unpolished pacbio assemblies, but I guess this initially polished assembly should be OK ?

Cheers,

jdmontenegro avatar Jun 26 '18 11:06 jdmontenegro

The initially polished assembly should be alright.

Best regards, Robert

rvaser avatar Jun 26 '18 12:06 rvaser

Hi guys, I ran the last version of Racon with Illumina pair-end reads ( My contigs have already been polished by pilon before). After Racon, 8 breaks have been fixed surprisingly. I don't understand, does racon also stitch contigs? Isn't it a consensus tool? Thanks! Xinwen

xinwenzhg avatar Jul 28 '18 00:07 xinwenzhg

Hi Xinwen, racon does not stitch contigs together, it polishes each of them separately. By breaks you mean what exactly?

Best regards, Robert

rvaser avatar Jul 28 '18 08:07 rvaser

Hi Robert, After racon, the number of fragments in my fasta file changed from 29 to 21, so I thought racon may fix some breaks by stitching fragments together. That's why I ask.

Then I checked the fragments length in fasta files before and after racon, and found my eight fragments (1000 -1500 bp ) are simply removed by racon instead of stitching to other fragments. Some of my other long contigs got 0-100 bp shorter. Is this the expected behavior of racon? Thank you!

Best regards, Xinwen

xinwenzhg avatar Jul 28 '18 19:07 xinwenzhg

Hi Xinwen, racon by default does not output unpolished sequences. You can disable that with the following option

-u, --include-unpolished
    output unpolished target sequences

You can determine which of the outputted sequences are unpolished by checking their headers, i.e. by checking tags RC (number of reads used for polishing) and XC (percentage of windows corrected).

It is quite normal that the length of polished sequences is different (shorter/longer) when compared to original length.

Best regards, Robert

rvaser avatar Jul 28 '18 19:07 rvaser

Hi Robert, Thank you so much. I checked the headers, they're very helpful. Best regards, Xinwen

xinwenzhg avatar Jul 30 '18 06:07 xinwenzhg