Error with chromosome M
No file is generated for chromosome M. I am not sure what the problem is. Here is the command and the file details. The reference sequence and the coverage file have the same chromosome names.
$ spectre CNVCaller --coverage /data/coverage_dir/chrM/chrM.regions.bed.gz --sample-id Fast_301_spectre_chrM --output-dir /data/cnv_output/chrM --reference /data/GRCh38_chrM.fa --metadata /data/GRCh38_chrM.mdr --only-chr M --blacklist /data/grch38_blacklist_spectre.bed --min-cnv-len 20000
spectre::2025-03-26 17:07:39,598::INFO::spectre.spectre> Spectre input: CNVCaller --coverage /data/coverage_dir/chrM/chrM.regions.bed.gz --sample-id Fast_301_spectre_chrM --output-dir /data/cnv_output/chrM --reference /data/GRCh38_chrM.fa --metadata /data/GRCh38_chrM.mdr --only-chr M --blacklist /data/grch38_blacklist_spectre.bed --min-cnv-len 20000
spectre::2025-03-26 17:07:39,598::INFO::spectre.spectre> Spectre version: 0.2.1-patch-july15
spectre::2025-03-26 17:07:39,598::INFO::spectre.spectre> Starting spectre
spectre::2025-03-26 17:07:39,599::INFO::spectre.spectre> Evaluate if a new .mdr file needs to be created
spectre::2025-03-26 17:07:39,599::INFO::spectre.spectre> Using existing metadata file /data/GRCh38_chrM.mdr
spectre::2025-03-26 17:07:39,599::INFO::spectre.spectreCNV> Spectre calculating for: /data/coverage_dir/chrM/chrM.regions.bed.gz
spectre::2025-03-26 17:07:39,599::INFO::spectre.spectreCNV> Data normalization and outlier removal (right tail)
spectre::2025-03-26 17:07:39,600::ERROR::spectre.analysis.analysis> No coverage data found for any chromosome in the reference sequence!
spectre::2025-03-26 17:07:39,600::ERROR::spectre.analysis.analysis> Please make sure that the the reference sequence and the coverage file have matching chromosome names.
$ head /data/GRCh38_chrM.fa
>M
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGG
GTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTC
CTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACTTACTAAAGTGTGTTA
ATTAATTAATGCTTGTAGGACATAATAATAACAATTGAATGTCTGCACAGCCACTTTCCACACAGACATC
ATAACAAAAAATTTCCACCAAACCCCCCCTCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCA
AACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGGTATGCAC
TTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATACTACTAATCTCATCAATA
CAACCCCCGCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAACCAACCAAACCCC
AAAGACACCCCCCACAGTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTC
grch38_blacklist_spectre.bed does not have any M entries.
from /data/GRCh38_chrM.mdr
##### Statistic report
##### Basepair statistic
BPDEF A T C G GC% Total bp
BPSTA 5124 4094 5181 2169 44.35994930291508 16569
##### N-Sequence positions in given reference file
NSDEF From To
from coverage_dir/chrM/chrM.regions.bed.gz
M 0 1000 378.95
M 1000 2000 442.03
M 2000 3000 505.66
M 3000 4000 507.60
M 4000 5000 512.05
M 5000 6000 564.77
M 6000 7000 585.15
M 7000 8000 634.55
M 8000 9000 624.86
M 9000 10000 621.81
M 10000 11000 623.77
M 11000 12000 592.51
M 12000 13000 601.71
M 13000 14000 593.86
M 14000 15000 591.92
M 15000 16000 517.16
M 16000 16569 907.65
Hi @gsneha26
Thank you for your very detailed issue description. I will take a look and get back to you ASAP!
Best, Philippe
Hi @gsneha26
Looking at the data and Spectre's code, chromosome M produces an error because Spectre is restricted to a minimum chromosome length of 1 MB. Thus, small chromosomes like M (16569bp according to your data) are eliminated. This is because we usually see a duplication in M, and compared to chromosome 1 (diploid), we don't know the ploidy state or the number of copies of M.
I will add a flag to Spectre that overwrites the 1MB minimum chromosome length threshold, so you can manually lower the threshold if you need to analyse smaller chromosomes.
Best, Philippe
That would be great! Thank you @philippesanio. Looking forward to the commit.
Quick question - is there any particular thing we need to take care of, while dealing with the smaller chromosomes?
Sneha
Hi @gsneha26
I just added the flag --min-chr-len that should overwrite the default value of the minimum chromosome length. You can just add it to your Spectre command, e.g. --min-chr-len 100000. Just clone/pull the GitHub repository, build and install the tool locally with pip in your environment.
Quick question - is there any particular thing we need to take care of, while dealing with the smaller chromosomes?
Spectre was designed to detect large CNVs (~>100kb). Since we are using just coverage data do derive the CNVs, noise in the data plays a crucial role. Spectre uses at least 10 times the bin sizes (10kb) to compensate for noise. If you have a good signal-to-noise ratio in the coverage data, you can try to push the min-cnv-len below 100kb. However, the smaller the min-cnv-len, the higher the chances for false positives becomes.
Spectre can also use SNF data from Sniffles to search for breakpoint support from sniffles. If a breakpoint was detected by Sniffles the variant will receive SVSUPPORT=TRUE flag in the VCF INFO section. However, this feature is still experimental in this version of Spectre.
We are currently working on finalizing a new version, which should push down the min-cnv-len substantially, so definitely keep an eye out for that!
Best, Philippe