shasta
shasta copied to clipboard
If there is a way I can know which raw reads go to a specific contig?
Hi! If there is a way I can know which raw reads go to a specific contig?
For haploid assembly, you can do that using the following command line option:
--Assembly.writeReadsByAssembledSegment
I have not tested this option in some time, so if you bump into problems please post here and I will look into it.
For diploid assembly, this functionality is not available.
A bit more information on that option. If you turn it on, the assembly directory will contain a csv file named ReadsBySegment.csv
. The top of the file looks like this:
The meaning of the columns is as follows:
-
AssembledSegmentId
identifies an assembled segment (same identifier used in other assembly output such asAssembly.fasta
). -
EdgeCount
is the length of that assembled segment (number of edges) in the marker graph. -
OrientedReadCount
is the number of oriented reads that were used to assemble the segment. An oriented read is a read in either the original orientation, or with reverse complement. -
OrientedReadId
is the Shasta internal id of a read that was used to assemble the segment. It uses the formatReadId-Strand
where Strand can be 0 (original orientation) or 1 (reverse complemented). So for example66-1
means read 66, reverse complemented. To convert the Shasta internalReadId
to the read name in the input fasta/fastq files, you can use the first two columns ofReadSummary.csv
.VertexCount
andEdgeCount
are the number of marker graph vertices and edges, respectively, that the given oriented reads appear on, out of the vertices and edges that make up the assembled segment.
Thank you so much!!!
I am closing this due to lack of additional discussion. If other questions emerge, feel free to open another issue.