DRAM
DRAM copied to clipboard
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Hi, I got this error from DRAM-v.py
Here the command used
DRAM-v.py annotate -i ENSEMBLE_vRhyme/for-dramv/final-viral-combined-for-dramv_mod.fa -v ENSEMBLE_vRhyme/for-dramv/viral-affi-contigs-for-dramv_mod.tab -o annotation --threads 28 --verbose
Here the log
5:04:00.646081: Getting hits from VOGDB
9:58:05.351548: Merging ORF annotations
Traceback (most recent call last):
File "/home/gmichoud/miniconda3/envs/DRAM/bin/DRAM-v.py", line 153, in <module>
args.func(**args_dict)
File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_vgfs.py", line 473, in annotate_vgfs
annotations = annotate_fastas(contig_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas
annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta
annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs,
File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 887, in annotate_orfs
annotations = pd.concat(annotation_list, axis=1, sort=False)
File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 359, in concat
return op.get_result()
File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 588, in get_result
indexers[ax] = obj_labels.get_indexer(new_labels)
File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3721, in get_indexer
raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
srun: error: f020: task 0: Exited with exit code 1
Any ideas? Greg
Can you share the input files?
Hi, I saw that they were a few duplicate sequences (by id) that I'm checking to see if it's the issue Will get back to you about that
Thanks! If that is the issue let me know how long it ran, it would be nice if we just checked that at the beginning so people don't have to wait an hour to find out there's dupes.
Yes, it was apparently the issue.
It takes a lot of time for this to appear as it happens at the Merging ORF annotations
step after all the annotations that in my case took +9h
@michoug @rmFlynn I too came across the exact error. I did not find any duplicate headers in the input fasta file prepared by virsorter2.
Hi, I saw that they were a few duplicate sequences (by id) that I'm checking to see if it's the issue
When you say duplicate sequences (by id) does it mean the headers or the actual sequences?
Hi, it was the headers in my case
Duplicate sequences should not cause this error, in fact they may not be detected at all, which could be a problem. If @ShailNair you can share your input files we may be able to get to the bottom of this quickly. I suspect that after the v-sorter headers are parsed, there is a duplicate header.
@michoug Thanks. I will check for that once more. @rmFlynn Hi, here are the input files prepared by virsorter2
I used following command to prepare these files
virsorter run --seqname-suffix-off
--viral-gene-enrich-off
--prep-for-dramv
-i 0.3.depreplication/0.2dereplication/checkv/merged.final.bins.fasta
-w 0.6.VMAGS_function/0.1.virsorter2
--include-groups dsDNAphage,ssDNA
--min-length 5000
--min-score 0.5
-j 50 all
It looks like you are not using virsorter2, so no extra parsing steps are needed. The only problem I have found thus far is bin_26418-cat_4 is repeated twice.
@rmFlynn Sorry for the trouble. I was actually looking at the virsorter input fasta file for duplicate headers instead of the output fasta. By renaming the duplicate header I could run DRAM-v successfully.
It looks like you are not using virsorter2, so no extra parsing steps are needed.
That's strange. I used VirSorter 2.2.3 or at least that's the version it shows on my shell.