shasta If there is a way I can know which raw reads go to a specific contig?

If there is a way I can know which raw reads go to a specific contig?

Open WenyuLiang opened this issue 1 year ago • 3 comments

Hi! If there is a way I can know which raw reads go to a specific contig?

Jul 29 '22 20:07 WenyuLiang

For haploid assembly, you can do that using the following command line option:

--Assembly.writeReadsByAssembledSegment

I have not tested this option in some time, so if you bump into problems please post here and I will look into it.

For diploid assembly, this functionality is not available.

Jul 29 '22 20:07 paoloczi

A bit more information on that option. If you turn it on, the assembly directory will contain a csv file named ReadsBySegment.csv. The top of the file looks like this:

The meaning of the columns is as follows:

AssembledSegmentId identifies an assembled segment (same identifier used in other assembly output such as Assembly.fasta).
EdgeCount is the length of that assembled segment (number of edges) in the marker graph.
OrientedReadCount is the number of oriented reads that were used to assemble the segment. An oriented read is a read in either the original orientation, or with reverse complement.
OrientedReadId is the Shasta internal id of a read that was used to assemble the segment. It uses the format ReadId-Strand where Strand can be 0 (original orientation) or 1 (reverse complemented). So for example 66-1 means read 66, reverse complemented. To convert the Shasta internal ReadId to the read name in the input fasta/fastq files, you can use the first two columns of ReadSummary.csv. VertexCount and EdgeCount are the number of marker graph vertices and edges, respectively, that the given oriented reads appear on, out of the vertices and edges that make up the assembled segment.

Aug 02 '22 16:08 paoloczi

Thank you so much!!!

Aug 07 '22 23:08 WenyuLiang

I am closing this due to lack of additional discussion. If other questions emerge, feel free to open another issue.

Aug 26 '22 20:08 paoloczi

shasta shasta copied to clipboard

If there is a way I can know which raw reads go to a specific contig?

shasta
shasta copied to clipboard