DRAM icon indicating copy to clipboard operation
DRAM copied to clipboard

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Open michoug opened this issue 2 years ago • 10 comments

Hi, I got this error from DRAM-v.py

Here the command used

DRAM-v.py annotate -i ENSEMBLE_vRhyme/for-dramv/final-viral-combined-for-dramv_mod.fa -v ENSEMBLE_vRhyme/for-dramv/viral-affi-contigs-for-dramv_mod.tab -o annotation --threads 28 --verbose

Here the log

5:04:00.646081: Getting hits from VOGDB
9:58:05.351548: Merging ORF annotations
Traceback (most recent call last):
  File "/home/gmichoud/miniconda3/envs/DRAM/bin/DRAM-v.py", line 153, in <module>
    args.func(**args_dict)
  File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_vgfs.py", line 473, in annotate_vgfs
    annotations = annotate_fastas(contig_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
  File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas
    annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
  File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta
    annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs,
  File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 887, in annotate_orfs
    annotations = pd.concat(annotation_list, axis=1, sort=False)
  File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 359, in concat
    return op.get_result()
  File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 588, in get_result
    indexers[ax] = obj_labels.get_indexer(new_labels)
  File "/home/gmichoud/miniconda3/envs/DRAM/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3721, in get_indexer
    raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
srun: error: f020: task 0: Exited with exit code 1

Any ideas? Greg

michoug avatar Mar 23 '22 07:03 michoug

Can you share the input files?

rmFlynn avatar Mar 24 '22 13:03 rmFlynn

Hi, I saw that they were a few duplicate sequences (by id) that I'm checking to see if it's the issue Will get back to you about that

michoug avatar Mar 24 '22 14:03 michoug

Thanks! If that is the issue let me know how long it ran, it would be nice if we just checked that at the beginning so people don't have to wait an hour to find out there's dupes.

rmFlynn avatar Mar 24 '22 16:03 rmFlynn

Yes, it was apparently the issue. It takes a lot of time for this to appear as it happens at the Merging ORF annotations step after all the annotations that in my case took +9h

michoug avatar Mar 27 '22 09:03 michoug

@michoug @rmFlynn I too came across the exact error. I did not find any duplicate headers in the input fasta file prepared by virsorter2.

Hi, I saw that they were a few duplicate sequences (by id) that I'm checking to see if it's the issue

When you say duplicate sequences (by id) does it mean the headers or the actual sequences?

ShailNair avatar Jul 20 '22 06:07 ShailNair

Hi, it was the headers in my case

michoug avatar Jul 20 '22 07:07 michoug

Duplicate sequences should not cause this error, in fact they may not be detected at all, which could be a problem. If @ShailNair you can share your input files we may be able to get to the bottom of this quickly. I suspect that after the v-sorter headers are parsed, there is a duplicate header.

rmFlynn avatar Jul 21 '22 00:07 rmFlynn

@michoug Thanks. I will check for that once more. @rmFlynn Hi, here are the input files prepared by virsorter2

I used following command to prepare these files

virsorter run --seqname-suffix-off
--viral-gene-enrich-off
--prep-for-dramv
-i 0.3.depreplication/0.2dereplication/checkv/merged.final.bins.fasta
-w 0.6.VMAGS_function/0.1.virsorter2
--include-groups dsDNAphage,ssDNA
--min-length 5000
--min-score 0.5
-j 50 all

ShailNair avatar Jul 21 '22 08:07 ShailNair

It looks like you are not using virsorter2, so no extra parsing steps are needed. The only problem I have found thus far is bin_26418-cat_4 is repeated twice.

rmFlynn avatar Jul 21 '22 21:07 rmFlynn

@rmFlynn Sorry for the trouble. I was actually looking at the virsorter input fasta file for duplicate headers instead of the output fasta. By renaming the duplicate header I could run DRAM-v successfully.

It looks like you are not using virsorter2, so no extra parsing steps are needed.

That's strange. I used VirSorter 2.2.3 or at least that's the version it shows on my shell.

ShailNair avatar Jul 22 '22 05:07 ShailNair