straglr icon indicating copy to clipboard operation
straglr copied to clipboard

Using Telomere-to-Telomere as reference

Open Aarhus-kga opened this issue 1 year ago • 2 comments

For my analysis I wish to use the Telomere-to-Telomere human reference genome of which there are two assemblies; RefSeq assembly (GCF_009914755.1) and GenBank assembly (GCA_009914755.4). I've aligned my data to both these references separately. When I use the GenBank assembly Straglr runs just fine, but with the RefSeq assembly Straglr just finishes in 1 second and makes empty output files with only headers. It does not throw an error or anything.

The only major differences between the two assemblies, as far as I can see, is that RefSeq's does not have the mitochondrial genome (which shouldn't make any difference in this matter) and that they have different naming conventions for the chromosomes. For example:

hg38 GCF_009914755.1 (RefSeq) GCA_009914755.4 (GenBank)
chr1 NC_060925.1 CP068277.2
chr2 NC_060926.1 CP068276.2
chr3 NC_060927.1 CP068275.2
... ... ...

My only wild guess (which I don't really believe to be the reason) is that Straglr does not like the underscores in RefSeq's naming convention. Other than that I'm at a loss as to what the problem could be.

Aarhus-kga avatar Sep 24 '24 12:09 Aarhus-kga

I just did a test with a small region of the genome where I know there is a repeat present. I removed the underscores from the RefSeq chromosome names (NC_060925.1 --> NC060925.1) in both the reference and bam file and it worked!! Could the code possibly be updated to handle underscores in chromosome names? :)

Aarhus-kga avatar Sep 24 '24 13:09 Aarhus-kga

Please try running it with the option --include_partials, it should include chromosomes with underscores in their names

readmanchiu avatar Sep 24 '24 18:09 readmanchiu

I used --include_alt_chroms and it worked like a charm :) Thanks for your help and sorry I didn't look thoroughly at the paramaters before submitting an issue.

Aarhus-kga avatar Sep 27 '24 08:09 Aarhus-kga