shovill icon indicating copy to clipboard operation
shovill copied to clipboard

Documentation requested for all output files

Open lskatz opened this issue 4 years ago • 3 comments

Not all files are described when I used --keepfiles. Could there be documentation for them, even if it's not in the main documentation? The additional files are the files I had after a skesa run. I am adding onto your table.

Filename Description
contigs.fa The final assembly you should use
shovill.log Full log file for bug reporting
shovill.corrections List of post-assembly corrections
contigs.gfa Assembly graph (spades)
contigs.fastg Assembly graph (megahit)
contigs.LastGraph Assembly graph (velvet)
skesa.fasta Raw assembly (skesa)
spades.fasta Raw assembled contigs (spades)
megahit.fasta Raw assembly (megahit)
velvet.fasta Raw assembly (velvet)
flash.extendedFrags.fastq.gz
flash.hist
flash.histogram
flash.notCombined_1.fastq.gz
flash.notCombined_2.fastq.gz
R1.cor.fq.gz Corrected R1 reads from Ligher
R1.fq.gz Original R1
R1.sub.fq.gz Subsampled R1 using --depth
R2.cor.fq.gz Corrected R2 reads from Ligher
R2.fq.gz Original R2
R2.sub.fq.gz Subsampled R2 using --depth
shovill.bam Alignment of corrected reads from Trimmomatic against the vanilla assembly (skesa, spades, or megahit). Reads have been sorted with Samtools and filtered with samclip (ie, nuanced alignments have been removed).
shovill.bam.bai Index file for shovill.bam
skesa.fasta.amb
skesa.fasta.ann
skesa.fasta.bwt
skesa.fasta.fai
skesa.fasta.pac
skesa.fasta.sa
skesa.fasta.uncorrected

lskatz avatar Jun 11 '20 14:06 lskatz

And my question leading into this was whether the bam file is all reads mapped against the assembly or only those trimmed/cleaned/flash'd.

lskatz avatar Jun 11 '20 14:06 lskatz

@lskatz can you look at the perl code and tell me? :)

The original (or trimmomaticked originals if --trim) are used for alignment/correction.

I do NOT use the FLASH ones because one of the potential errors I want to correct is mis-FLASH-ed reads.

tseemann avatar Jun 11 '20 22:06 tseemann

I took some language from

  • https://wiki.gacrc.uga.edu/wiki/FLASH-Sapelo
  • http://seqanswers.com/forums/showpost.php?p=90992&postcount=2
Filename Description
contigs.fa The final assembly you should use
shovill.log Full log file for bug reporting
shovill.corrections List of post-assembly corrections
contigs.gfa Assembly graph (spades)
contigs.fastg Assembly graph (megahit)
contigs.LastGraph Assembly graph (velvet)
skesa.fasta Raw assembly (skesa)
spades.fasta Raw assembled contigs (spades)
megahit.fasta Raw assembly (megahit)
velvet.fasta Raw assembly (velvet)
flash.extendedFrags.fastq.gz Single-end reads from Flash, ie, merged reads
flash.hist Numeric histogram of merged read lengths.
flash.histogram Visual histogram of merged read lengths.
flash.notCombined_1.fastq.gz R1 reads not combined in Flash
flash.notCombined_2.fastq.gz R2 reads not combined in Flash
R1.cor.fq.gz Corrected R1 reads from Lighter
R1.fq.gz Original R1
R1.sub.fq.gz Subsampled R1 using --depth
R2.cor.fq.gz Corrected R2 reads from Ligher
R2.fq.gz Original R2
R2.sub.fq.gz Subsampled R2 using --depth
shovill.bam Alignment of corrected reads from Trimmomatic against the vanilla assembly (skesa, spades, or megahit). Reads have been sorted with Samtools and filtered with samclip (ie, nuanced alignments have been removed).
shovill.bam.bai Index file for shovill.bam
skesa.fasta.amb (bwa index) text file, to record appearance of N (or other non-ATGC) in the ref fasta.
skesa.fasta.ann (bwa index) text file, to record ref sequences, name, length, etc.
skesa.fasta.bwt (bwa index) binary, the Burrows-Wheeler transformed sequence.
skesa.fasta.fai Samtools index file
skesa.fasta.pac (bwa index) binary, packaged sequence (four base pairs encode one byte).
skesa.fasta.sa (bwa index) binary, suffix array index.
skesa.fasta.uncorrected Assembly before pilon correction

lskatz avatar Jun 13 '20 00:06 lskatz