Spectre icon indicating copy to clipboard operation
Spectre copied to clipboard

Error with chromosome M

Open gsneha26 opened this issue 9 months ago • 4 comments

No file is generated for chromosome M. I am not sure what the problem is. Here is the command and the file details. The reference sequence and the coverage file have the same chromosome names.

$ spectre CNVCaller --coverage /data/coverage_dir/chrM/chrM.regions.bed.gz --sample-id Fast_301_spectre_chrM --output-dir /data/cnv_output/chrM --reference /data/GRCh38_chrM.fa --metadata /data/GRCh38_chrM.mdr --only-chr M --blacklist /data/grch38_blacklist_spectre.bed --min-cnv-len 20000
spectre::2025-03-26 17:07:39,598::INFO::spectre.spectre>  Spectre input: CNVCaller --coverage /data/coverage_dir/chrM/chrM.regions.bed.gz --sample-id Fast_301_spectre_chrM --output-dir /data/cnv_output/chrM --reference /data/GRCh38_chrM.fa --metadata /data/GRCh38_chrM.mdr --only-chr M --blacklist /data/grch38_blacklist_spectre.bed --min-cnv-len 20000
spectre::2025-03-26 17:07:39,598::INFO::spectre.spectre>  Spectre version: 0.2.1-patch-july15
spectre::2025-03-26 17:07:39,598::INFO::spectre.spectre>  Starting spectre
spectre::2025-03-26 17:07:39,599::INFO::spectre.spectre>  Evaluate if a new .mdr file needs to be created
spectre::2025-03-26 17:07:39,599::INFO::spectre.spectre>  Using existing metadata file /data/GRCh38_chrM.mdr
spectre::2025-03-26 17:07:39,599::INFO::spectre.spectreCNV>  Spectre calculating for: /data/coverage_dir/chrM/chrM.regions.bed.gz
spectre::2025-03-26 17:07:39,599::INFO::spectre.spectreCNV>  Data normalization and outlier removal (right tail)
spectre::2025-03-26 17:07:39,600::ERROR::spectre.analysis.analysis>  No coverage data found for any chromosome in the reference sequence!
spectre::2025-03-26 17:07:39,600::ERROR::spectre.analysis.analysis>  Please make sure that the the reference sequence and the coverage file have matching chromosome names.
$ head /data/GRCh38_chrM.fa
>M
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGG
GTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTC
CTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACTTACTAAAGTGTGTTA
ATTAATTAATGCTTGTAGGACATAATAATAACAATTGAATGTCTGCACAGCCACTTTCCACACAGACATC
ATAACAAAAAATTTCCACCAAACCCCCCCTCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCA
AACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGGTATGCAC
TTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATACTACTAATCTCATCAATA
CAACCCCCGCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAACCAACCAAACCCC
AAAGACACCCCCCACAGTTTATGTAGCTTACCTCCTCAAAGCAATACACTGAAAATGTTTAGACGGGCTC

grch38_blacklist_spectre.bed does not have any M entries.

from /data/GRCh38_chrM.mdr

#####	Statistic report
#####	Basepair statistic
BPDEF	A	T	C	G	GC%	Total bp
BPSTA	5124	4094	5181	2169	44.35994930291508	16569
#####	N-Sequence positions in given reference file
NSDEF	From	To

from coverage_dir/chrM/chrM.regions.bed.gz

M   0   1000    378.95                                                                                                                         
M   1000    2000    442.03
M   2000    3000    505.66
M   3000    4000    507.60
M   4000    5000    512.05
M   5000    6000    564.77
M   6000    7000    585.15
M   7000    8000    634.55
M   8000    9000    624.86
M   9000    10000   621.81
M   10000   11000   623.77
M   11000   12000   592.51
M   12000   13000   601.71
M   13000   14000   593.86
M   14000   15000   591.92
M   15000   16000   517.16
M   16000   16569   907.65

gsneha26 avatar Mar 26 '25 17:03 gsneha26

Hi @gsneha26

Thank you for your very detailed issue description. I will take a look and get back to you ASAP!

Best, Philippe

philippesanio avatar Mar 28 '25 16:03 philippesanio

Hi @gsneha26

Looking at the data and Spectre's code, chromosome M produces an error because Spectre is restricted to a minimum chromosome length of 1 MB. Thus, small chromosomes like M (16569bp according to your data) are eliminated. This is because we usually see a duplication in M, and compared to chromosome 1 (diploid), we don't know the ploidy state or the number of copies of M.

I will add a flag to Spectre that overwrites the 1MB minimum chromosome length threshold, so you can manually lower the threshold if you need to analyse smaller chromosomes.

Best, Philippe

philippesanio avatar Mar 31 '25 17:03 philippesanio

That would be great! Thank you @philippesanio. Looking forward to the commit.

Quick question - is there any particular thing we need to take care of, while dealing with the smaller chromosomes?

Sneha

gsneha26 avatar Mar 31 '25 19:03 gsneha26

Hi @gsneha26

I just added the flag --min-chr-len that should overwrite the default value of the minimum chromosome length. You can just add it to your Spectre command, e.g. --min-chr-len 100000. Just clone/pull the GitHub repository, build and install the tool locally with pip in your environment.

Quick question - is there any particular thing we need to take care of, while dealing with the smaller chromosomes?

Spectre was designed to detect large CNVs (~>100kb). Since we are using just coverage data do derive the CNVs, noise in the data plays a crucial role. Spectre uses at least 10 times the bin sizes (10kb) to compensate for noise. If you have a good signal-to-noise ratio in the coverage data, you can try to push the min-cnv-len below 100kb. However, the smaller the min-cnv-len, the higher the chances for false positives becomes.

Spectre can also use SNF data from Sniffles to search for breakpoint support from sniffles. If a breakpoint was detected by Sniffles the variant will receive SVSUPPORT=TRUE flag in the VCF INFO section. However, this feature is still experimental in this version of Spectre.

We are currently working on finalizing a new version, which should push down the min-cnv-len substantially, so definitely keep an eye out for that!

Best, Philippe

philippesanio avatar Mar 31 '25 20:03 philippesanio