anvio
anvio copied to clipboard
Genomeview merge debugging
PLEASE DON'T MERGE THIS PULL REQUEST WITHOUT COMPLETE CONSENSUS
This is a PR started by @isaacfink21 to save the original genome-view branch, which is in a difficult state due to some ancient git 'force' push commands. This page shall as a medium for discussions related to this merge and relevant updates, involving everyone, especially @matthewlawrenceklein and @isaacfink21 :)
Hey @isaacfink21, thank you very much for starting this. I'm carrying the discussion from Slack to here so we can keep track of things more easily. Here are some points:
Browsing through https://github.com/merenlab/anvio/compare/master...genomeview-merge-debugging to compare differences between the current main branch and this new fork. it looks much better (since it doesn’t have many things being removed from the main repo). As I was going around I've made some changes and took some notes. Some of them very important, some of them just to skim through, but I hope you would take a look and respond to those that you think deserves a response.
Step 1
First, I run git merge master
and pushed it, and things still look good without showing some egregious deletions when the branch is compared to the main branch, which is exactly what we want.
Step 2
Although, I still see removal of some code like this one:
I’m not sure why it is the case as I can’t imagine anyone removing that variable. Here is why:
$ ~/github/anvio >>> git co master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
$ ~/github/anvio >>> cd anvio/data/interactive/
$ ~/github/anvio/anvio/data/interactive >>> grep COG_categories * -nr
js/constants.js:1:var COG_categories = {
js/constants.js:107: var cogs = d.split(', ').map((function (c){if (c in COG_categories) {return COG_categories[c];} else {return c;}}));
js/constants.js:121: var cogs = d.split(', ').map((function (c){if (c in COG_categories) {return COG_categories[c];} else {return c;}}));
js/charts.js:462: mcags = Object.keys(COG_categories);
js/charts.js:628: return COG_categories;
js/charts.js:1740: ["none"].concat(Object.keys(COG_categories)).forEach(function(category){
$ ~/github/anvio/anvio/data/interactive >>> git co genomeview-merge-debugging
Switched to branch 'genomeview-merge-debugging'
Your branch is up to date with 'origin/genomeview-merge-debugging'.
$ ~/github/anvio/anvio/data/interactive >>> grep COG_categories * -nr
js/constants.js:89: var cogs = d.split(', ').map((function (c){if (c in COG_categories) {return COG_categories[c];} else {return c;}}));
js/constants.js:103: var cogs = d.split(', ').map((function (c){if (c in COG_categories) {return COG_categories[c];} else {return c;}}));
js/genomeview.js:1597: return COG_categories[category];
$ ~/github/anvio/anvio/data/interactive >>>
As you can see, while the variable is in the main branch is used by some files, the variable is no longer defined in constants.js
in genomeview-merge-debugging
even though it is still used by the same files. The same goes for KEGG_categories
in the same constants.js
. The problem is, not knowing how many of these has occurred makes me very concerned. So someone needs to go through this output very carefully to identify such cases (another one is news.js
, for instance, I don't see why there should have been any changes to that file in this branch).
Step 3
The file run_all_tests.sh
has become run_component_tests_for_metagenomics.sh
:
$ ~/github/anvio/anvio/tests >>> diff -Nuar run_all_tests.sh run_component_tests_for_metagenomics.sh
--- run_all_tests.sh 2022-08-09 10:25:39.000000000 +0200
+++ run_component_tests_for_metagenomics.sh 2022-08-09 10:27:06.000000000 +0200
@@ -168,7 +168,7 @@
anvi-script-filter-hmm-hits-table -c $output_dir/CONTIGS.db \
--domain-hits-table $output_dir/hmm.domtable \
--hmm-source Bacteria_71 \
- --target-coverage 0.9 \
+ --model-coverage 0.9 \
--no-progress
INFO "Listing all available HMM sources in the contigs database"
So I removed it in https://github.com/merenlab/anvio/pull/1966/commits/bdf5db1947319cfcbcfc193af34222d152ba0d67. In that vein, run_genome_view.sh
in the tests directory should become run_component_tests_for_genome_view.sh
, and the program anvi-self-test
should know about it. I took care of this in https://github.com/merenlab/anvio/pull/1966/commits/8c32160e90336f892bf4ee4f27670d7e5e3f4823, and we can run this to find out what else needs to be fixed:
anvi-self-test --suite genome-view
Step 4
Running the test revealed this problem:
Traceback (most recent call last):
File "/Users/meren/github/anvio/bin/anvi-self-test", line 12, in <module>
import anvio
File "/Users/meren/github/anvio/anvio/__init__.py", line 3614, in <module>
__genome_view_db__version__ = set_version()
File "/Users/meren/github/anvio/anvio/__init__.py", line 3573, in set_version
t.genome_view_db_version
AttributeError: module 'anvio.tables' has no attribute 'genome_view_db_version'
Which is now fixed in https://github.com/merenlab/anvio/pull/1966/commits/fb4aabc2871af4d8f570eaa8da23753ecac338d7.
Step 5
Fixing that revealed another problem:
:: Generating an external genomes file for DBs ...
usage: anvi-script-gen-genomes-file [-h] [--input-dir DIR_PATH]
[--include-subdirs] [-c CONTIGS_DB]
[-p PROFILE_DB] [-C COLLECTION_NAME]
[-o FILE_PATH] [--version] [--debug]
[--force] [--fix-sad-tables] [--quiet]
[--no-progress] [--as-markdown]
[--display-db-calls] [--force-use-my-tree]
[--force-overwrite] [--tmp-dir TMP_DIR]
[--I-know-this-is-not-a-good-idea]
anvi-script-gen-genomes-file: error: unrecognized arguments: --name-prefix G0
I have no idea when there was a --name-prefix
argument for the program anvi-script-gen-genomes-file
, but I removed it via https://github.com/merenlab/anvio/pull/1966/commits/fa7548c43ddc3eeff8b90dfde5f7f8c6e8840b59.
Step 6
Re-running the test revealed this problem:
[09 Aug 22 11:07:30 Populating continuous data] GC-content for E_faecalis_6240 ETA: NoneTraceback (most recent call last):
File "/Users/meren/github/anvio/bin/anvi-display-genomes", line 76, in <module>
d = interactive.GenomeView(args)
File "/Users/meren/github/anvio/anvio/interactive.py", line 1881, in __init__
AggregateGenomes.__init__(self, self.args, run=self.run, progress=self.progress)
File "/Users/meren/github/anvio/anvio/genomedescriptions.py", line 911, in __init__
self.init()
File "/Users/meren/github/anvio/anvio/genomedescriptions.py", line 939, in init
self.populate_genome_continuous_data_layers()
File "/Users/meren/github/anvio/anvio/genomedescriptions.py", line 1083, in populate_genome_continuous_data_layers
self.populate_continuous_GC_content_data()
File "/Users/meren/github/anvio/anvio/genomedescriptions.py", line 1013, in populate_continuous_GC_content_data
self.continuous_data_layers['data'][genome_name]['GC_content'][contig_name] = utils.get_GC_content_for_sequence_as_an_array(contig_sequence)
AttributeError: module 'anvio.utils' has no attribute 'get_GC_content_for_sequence_as_an_array'
Indeed, there is no such function anywhere:
$ ~/github/anvio >>> grep --exclude-dir data get_GC_content_for_sequence_as_an_array * -nr
anvio/genomedescriptions.py:1013: self.continuous_data_layers['data'][genome_name]['GC_content'][contig_name] = utils.get_GC_content_for_sequence_as_an_array(contig_sequence)
$ :: anvi'o v7 dev :: ~/github/anvio >>>
There is a utils function to get the GC content of a single sequence, but that's now what we want. I remember writing this function, and it should be somewhere in the genomeview-backend. But where the hell that function go? I can't find it anywhere on GitHub or even Slack :( What else are we missing like this? Argh.
I have to go now. I hope this was useful and we can continue the conversation here.
The problem is, not knowing how many of these has occurred makes me very concerned. So someone needs to go through this output very carefully to identify such cases (another one is news.js, for instance, I don't see why there should have been any changes to that file in this branch).
I just made sure, the changes to news.js
is a step-back from what we have in the main repository! I.e., it works great in main and contains @matthewlawrenceklein's latest changes that made it better, and the version in genomeview-merge-debugging
reverts all those changes back :(
Thanks @meren! I didn't touch anything in the interactive
folder so there are likely still several file changes there that are just regressions. I will need to go through interactive
on its own more carefully to pick out intended changes. None of the regressions are caught when running git merge master
so I've been doing this manually case-by-case with git checkout master -- <file>
.
I think I had actually removed COG_categories
and KEGG_categories
from constants.js
on purpose a long time ago (I don't remember why but I think they were no longer needed in genome view / inspect page and I didn't realize they were used elsewhere)!
I will go through your comments again more carefully and then I think the next step is to sift through interactive
, but I will especially need help with code that integrates genome view with the main interface so I don't inadvertently discard intended changes.
Separately, I noticed there are several additions to authorship information in files unrelated to genome view (e.g. area-zoom.js
) - should these changes be merged sooner in a separate PR?
Separately, I noticed there are several additions to authorship information in files unrelated to genome view (e.g. area-zoom.js) - should these changes be merged sooner in a separate PR?
I don't think there is any rush for them.
Hey @isaacfink21, I discovered something weird about the inspect page.
If you inspect any contig with anvi-iteractive
, and click 'save state' on the new page, the page is frozen like this,
with the following error in the developer console:
Redrawing gene arrows ( 10:15:).
utils.js?rand=941c493f9b378a5fcfd9b096403f70da:44 Resetting arrow markers ( 10:15:).
utils.js?rand=941c493f9b378a5fcfd9b096403f70da:44 Drawing gene arrows ( 10:15:).
DevTools failed to load source map: Could not load content for chrome-extension://gighmmpiobklfepjocnamgkkbiglidom/browser-polyfill.js.map: System error: net::ERR_FILE_NOT_FOUND
Which I have no idea where it is coming from, but when I switch to the main branch everything works.
@meren This is fixed for me now - adding the framework for batch coloring to the inspect page somehow added an invisible layer covering the save state modal…I have no idea why but I removed it and will need to figure out how to reintegrate it later in a new branch
As of now, I think all the changes I can see in this branch are correct. I am also unable to find the missing get_GC_content_for_sequence_as_an_array
function, not even on genomeview-backend
... only a function get_GC_content_for_sequence
Thank you very much, @isaacfink21.
I'm not sure what to do about get_GC_content_for_sequence_as_an_array
. Perhaps @matthewlawrenceklein happened to have an idea and/or input?
Finally found! Looks like my last commit to genomeview-backend
merging with master removed this function. This is unfortunate because I thought the first commit on this branch reverted the merge, but now I will need to go through the rest of the merge to make sure no other genome view additions were removed... 🥲
I also realized we still have js/genomeview.js
which is no longer used, but I'm hesistant to delete it before making sure everything is transferred over to the new files...so this is a reminder to myself to go through it
I'm happy that you found it somewhere there. There is also the news.js
. it seems what we have in the main branch is more up-to-date / advanced then what this branch has :)
@isaacfink21 thanks for taking the time to dig through all of this and get everything back to a workable state. I've been pretty hands-off while you've been restoring things (seems like a 'too many cooks in the kitchen' scenario), but let me know if I can jump in or facilitate this process in any way!
@meren The only additions I'm seeing innews.js
are authorship info - when I go through the compare files, everything on this branch changed from master seems to be an addition and not a regression, but let me know if you see / are aware of anything that hasn't been taken care of!
The main thing I'm worried about is genome view additions that were accidentally removed since this can't be seen from comparing with master - e.g. the get_GC_content_for_sequence_as_an_array
added to utils.py
- but after going through the merge I haven't found anything else that was removed