cnvkit icon indicating copy to clipboard operation
cnvkit copied to clipboard

ValueError: Duplicated genomic coordinates in sample set

Open AlsoATraveler opened this issue 4 years ago • 3 comments

Hi, all, my question is as follows, how can I solve it? thanks

my command: cnvkit.py fix tumor.targetcoverage.cnn tumor.antitargetcoverage.cnn Reference.cnn -o tumor.cnr

Processing target: samplename Traceback (most recent call last): File "xxx/cnvkit.py", line 9, in args.func(args) File "xxx/commands.py", line 610, in _cmd_fix target_table = fix.do_fix(tgt_raw, anti_raw, read_cna(args.reference), File "xxx/fix.py", line 15, in do_fix cnarr, ref_matched = load_adjust_coverages(target_raw, reference, File "xxx/fix.py", line 69, in load_adjust_coverages ref_matched = match_ref_to_sample(ref_cnarr, cnarr) File "xxx/fix.py", line 138, in match_ref_to_sample raise ValueError("Duplicated genomic coordinates in " + name + ValueError: Duplicated genomic coordinates in sample set: ('chr1', 24359903, 24360023) ('chr1', 24360027, 24360149) ('chr1', 39762447, 39762623)

AlsoATraveler avatar Jan 31 '21 12:01 AlsoATraveler

I got the same problem.

AlgoLab-XJTU avatar Feb 23 '21 08:02 AlgoLab-XJTU

I had the same problem.

I realised that my target files had duplicated lines. I did: target.bed | sort | uniq > new-target.bed and it solved the problem.

mquinodo avatar Mar 24 '21 11:03 mquinodo

In my case, this was because the target bed bore an individual record of the same feature for potential synonyms of that feature. See below:

$ grep 152719709 target.bed
chrX       152719709       152719829       TREX2
chrX       152719709       152719829       HAUS7

This issue was solved with awk '!seen[$1$2$3]++' target.bed > new-target.bad. Only the first record for each identical region is taken. I think it's possible that these are different genes that occupy the same region, which would make this an imperfect solution.

alextrouerntrend avatar Nov 10 '22 16:11 alextrouerntrend