quickmerge
quickmerge copied to clipboard
How to identify merged contigs in output
I am attempting to use quickmerge in an attempt to merge contigs from a long-read assembly into chromosome-length scaffolds using the hg38 reference genome. Please see the example command below:
merge_wrapper.py GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC.hapdup_dual_1.fasta hg38.fa -hco 5 -c 1.5 -l 500000 -v -t 24
The long-read assembly is from (here)[https://s3.amazonaws.com/1000g-ont/ALIGNMENT_AND_ASSEMBLY_DATA/FIRST_100/NAPU_PIPELINE/HG38/GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC/GM18519-ONT-hg38-R9-LSK110-guppy-sup-5mC.hapdup_dual_1.fasta] and hg38.fa is straight from UCSC.
Looking at the results, I am unable to tell which contigs in the output result from merging operations, so cannot compare to input contigs/chromosomes to ensure the output is correct. How do I identify merged contigs in the output fasta, and how do I tell what editing operations were done to generate the merged contigs? My goal here is to end up with the same number of contigs as hg38, excepting unmapped contigs, which appear to be written to the output unchanged. (Can you verify this is the case?)
Thank you in advance for your help!