anvio icon indicating copy to clipboard operation
anvio copied to clipboard

[BUG] KeyError during anvi-refine

Open alexanderjaffe opened this issue 4 months ago • 9 comments

Hi team,

I am trying to run anvi-refine (v7) on a set of bin-level contig/profile splits and keep encountering an HMM KeyError when running the anvi-refine command, e.g.:

[ajaffe@sh03-ln03 login /scratch/users/ajaffe/ocdata/anvio/refine]$ singularity exec /home/groups/dekas/software/anvio/anvio_7.sif anvi-refine -p /scratch/users/ajaffe/ocdata/anvio/profiles/split/A4500m50m/A4500m50m_157_fa/PROFILE.db -c /scratch/users/ajaffe/ocdata/anvio//profiles/split/A4500m50m/A4500m50m_157_fa/CONTIGS.db -C DEFAULT --show-all-layers --server-only -P 8080 Contigs DB ...................................: Initialized: /scratch/users/ajaffe/ocdata/anvio//profiles/split/A4500m50m/A4500m50m_157_fa/CONTIGS.db (v. 20) Traceback (most recent call last): File "/opt/conda/envs/anvioenv/bin/anvi-refine", line 124, in <module> d = interactive.Interactive(args) File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/interactive.py", line 211, in __init__ self.completeness = Completeness(self.contigs_db_path) File "/opt/conda/envs/anvioenv/lib/python3.6/site-packages/anvio/completeness.py", line 156, in __init__ hmm_hit = self.hmm_hits_table[entry['hmm_hit_entry_id']] KeyError: 2245

I have tried regenerating the split once or twice (read about race issues somehwere else on here) but that does not seem to resolve the issue. Any advice for how to regenerate the split profiles correctly, or just ignore HMM information during the refine call?

Thanks so much,

Alex

alexanderjaffe avatar Sep 05 '25 23:09 alexanderjaffe

Hey @alexanderjaffe, this looks like a bug from 3 yers ago, similar to #1936 :(

Do you think you switch to anvi'o v8 or anvi'o dev? I'm sure it will be resolved then (and of course you will be able to migrate all your databases without losing any data if these are historical samples).

Alternatively re-running anvi-run-hmms prior to splitting your bins could address this if you want to stay on v7.

Another alternative is that you could send me your split contigs-db and profile-db for A4500m50m_157_fa, and I can test it on anvi'o dev version to make sure first that everything works with the newer version of the codebase.

Best wishes, Meren

meren avatar Sep 06 '25 07:09 meren

Thank you so much, Meren! Appreciate your offer to help here.

Trying out the migrate route on the split profile.db - seems like I may need to reimport scaf2bins/collections once migrated - is this expected behavior?

alexanderjaffe avatar Sep 08 '25 21:09 alexanderjaffe

Hey @alexanderjaffe,

Migration should not require to re-import collections (since they are migrated to the latest version, too). But it has been so long since v7 I'm having hard time saying it with full confidence. Is the problem that you're migrating a split profile-db and losing your collections in it?

meren avatar Sep 09 '25 07:09 meren

Image

yes, after migration, the refine command suggests there are no collections.

alexanderjaffe avatar Sep 09 '25 16:09 alexanderjaffe

I see. These are split profiles that lost the default collections during migration. That's very annoying, and I'm sorry about that :(

You can use anvi-script-add-default-collection to add them back to your split profiles. I hope the original bug is now resolved at least?

meren avatar Sep 09 '25 19:09 meren

Hey Meren,

Looks like that was an issue with my migration, if I migrate both the contigs.db and profile.db the collections are maintained. My bad.

However, re-running the refine command still yields an HMM error. I also did try re-running HMMs on the pre-split db and no luck. Would you still be willing to take a look? It seems I cannot attach them here so I will go ahead and email you.

I am hoping there is a way to revive these without recomputing everything from scratch!

Thanks so much.

alexanderjaffe avatar Sep 11 '25 21:09 alexanderjaffe

No worries, @alexanderjaffe! we will figure this out :)

meren avatar Sep 12 '25 06:09 meren

Problem

OK. I've been looking at the files you kindly sent via email. I first migrated both of them,

anvi-migrate *db --migrate-quickly

And then I could reproduce the error you had when I tried to refine the bin:

anvi-refine -p PROFILE.db -c CONTIGS.db -C DEFAULT -b ALL_SPLITS

Traceback (most recent call last):
  File "/Users/meren/miniconda3/envs/anvio-dev/bin/anvi-refine", line 8, in <module>
    sys.exit(main())
  File "/Users/meren/github/anvio/anvio/cli/refine.py", line 75, in main
    d = interactive.Interactive(args)
  File "/Users/meren/github/anvio/anvio/interactive.py", line 213, in __init__
    self.completeness = Completeness(self.contigs_db_path)
  File "/Users/meren/github/anvio/anvio/completeness.py", line 157, in __init__
    hmm_hit = self.hmm_hits_table[entry['hmm_hit_entry_id']]
KeyError: 54609

This indicates that there are HMM hits stored in the contigs-db (for instance, for the gene call 54609 in this case), but the table keeps track of genes is not aware of the existence of this gene. I have no idea how this could have happened, as there is no tool in anvi'o that can remove genes from gene calls tables in contigs-db files. But to my surprise, there indeed was some sort of inconsistency:

Image

The screenshot above shows that hmm_hits_in_splits table includes certain HMM hits, including 54609, but neither genes_in_contigs table is aware of these genes, nor hmm_hits table is aware of the HMM hits the hmm_hits_in_splits talking about. Again. This is extremely odd. Running anvi-run-hmms does not fix the hmm_hits_in_splits table since there are no genes in the genes_in_contigs table, so the HMMs do not yield anything, and the splits table remains un-updated.

I am not sure if this was a migration snafu, which I would like to investigate further and solve if I can if you have your un-migrated, original databases.

This also could be a buggy anvi-split problem that got fixed along the way, which means we may never be able to capture this. But this instance is the very first time I'm running into this :/

Solution (of some sorts)

I realized the only way to address this with minimal loss of time is to re-generate the contigs-db file. The following set of commands will do it in any given directory where split contigs-db and profile-db files are stored (I'm running them in the directory you sent me, which took a total of 65 seconds on my laptop (and 21 seconds without the not-so-essential anvi-run-scg-taxonomy step)):

# migrate everything just to be sure.
anvi-migrate *db --migrate-quickly

# recover the original contigs
anvi-export-contigs -c CONTIGS.db -o contigs.fa

# learn the project name and contigs-db hash
PROJECT_NAME=$(sqlite3 CONTIGS.db "SELECT value FROM self WHERE key = 'project_name';")
HASH=$(sqlite3 CONTIGS.db "SELECT value FROM self WHERE key = 'contigs_db_hash';")

# regenerate the contigs-db from the contigs.fa
rm -rf CONTIGS.db && anvi-gen-contigs-database -f contigs.fa -o CONTIGS.db -T 6 --project-name $PROJECT_NAME

# update the contigs-db hash to maintain its compatibility
# with the existing PROFILE.db
sqlite3 CONTIGS.db "UPDATE self SET value = '$HASH' WHERE key = 'contigs_db_hash';"

# re-run HMMs and SCG taxonomy while at it
anvi-run-hmms -c CONTIGS.db -T 6
anvi-run-scg-taxonomy -c CONTIGS.db -T 6

# clean up the FASTA file
rm -rf contigs.fa

Now anvi-refine will works for this split as running the same command from before,

anvi-refine -p PROFILE.db -c CONTIGS.db -C DEFAULT -b ALL_SPLITS

Gives me this display:

Image

With a working bins panel:

Image Image

I am sorry for the inconvenience, @alexanderjaffe. Please let me know if this is a useful solution.

meren avatar Sep 12 '25 07:09 meren

This seems to work - and even lets me stay on v7 - thank you again, Meren!

alexanderjaffe avatar Sep 25 '25 20:09 alexanderjaffe