usegalaxy-playbook icon indicating copy to clipboard operation
usegalaxy-playbook copied to clipboard

Please add DADA2-formatted reference databases to test/main

Open gregvonkuster opened this issue 6 years ago • 12 comments

The dada2 tools are currently installed on Galaxy test and will soon be installed on Galaxy main. Please add the dada2 reference datasets https://benjjneb.github.io/dada2/training.html so that the tools that require them are functional. I believe that the General FASTA release will be sufficient, but others may be requested. Here is the download link for the general fast release: https://doi.org/10.15156/BIO/786343.

gregvonkuster avatar Nov 27 '19 14:11 gregvonkuster

I guess Silva is quite popular. Sometimes users prepare RDP because it comes with copy number variation data (if I remember correctly) but its older.

bernt-matthias avatar Nov 27 '19 15:11 bernt-matthias

There is also quite a bit extra info in the data manager's help

bernt-matthias avatar Nov 27 '19 15:11 bernt-matthias

@jennaj I have confirmed with the lab testing this pipeline that the General FASTA release https://doi.org/10.15156/BIO/786343 is what they need for reference datasets for their testing.

gregvonkuster avatar Dec 19 '19 13:12 gregvonkuster

Is this already in the data manager (aka dada manager)?

bernt-matthias avatar Dec 19 '19 13:12 bernt-matthias

I also just asked @martenson how to get these fixes https://github.com/galaxyproject/tools-iuc/pull/2705 applied to the tools on Galaxy test. I'm working with a lab doing some critical work with this pipeline. ;)

gregvonkuster avatar Dec 19 '19 13:12 gregvonkuster

I ran the data manager and it appeared to succeed, but I couldn't find the data on Test. It looks like all the DMs we've installed lately are going to be messed up, e.g.:

    <table comment_char="#" name="dada2_species">
        <columns>value, name, path</columns>
        <file path="/tmp/tool-data/toolshed.g2.bx.psu.edu/repos/iuc/dada2_filterandtrim/cc41546adf56/dada2_species.loc"/>
        <tool_shed_repository>
            <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
            <repository_name>dada2_filterandtrim</repository_name>
            <repository_owner>iuc</repository_owner>
            <installed_changeset_revision>cc41546adf56</installed_changeset_revision>
        </tool_shed_repository>
    </table>

This is discussed in #31. Except unlike before, this is even more of a problem since we don't have the tool-data files in CVMFS to copy as described in step 3 - they were discarded after installation.

natefoo avatar Jan 15 '20 19:01 natefoo

@natefoo @davebx thanks for everything you've done on this. Sorry this has created some issues.

gregvonkuster avatar Jan 15 '20 20:01 gregvonkuster

I fixed all the paths and whatnot, but the DM fails. The handler logs:

galaxy.tools.data_manager.manager WARNING 2020-01-15 14:30:16,408 No values for data table "dada2_taxonomy" were returned by the data manager "toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/data_manager/dada2_fetcher/0.0.1".

However, the DM's primary output appears to return a data table entry:

{"data_tables": {"dada2_taxonomy": {"name": "UNITE: General Fasta release 8.0 for Fungi", "path": "unite_8.0_fungi.taxonomy", "taxlevels": "Kingdom,Phylum,Class,Order,Family,Genus,Species", "value": "unite_8.0_fungi"}}}

Anyone with a better understanding of DMs know what's going on here?

natefoo avatar Jan 15 '20 20:01 natefoo

Interestingly... the log message references an old version of the DM (0.0.1) which I don't believe is even installed (both 0.0.7 and 0.0.8 appear to be installed, and 0.0.8 is the one that ran). It appears to come from the entry in shed_data_manager_conf.xml:

    <data_manager guid="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/data_manager/dada2_fetcher/0.0.1" id="dada2_fetcher" shed_conf_file="/cvmfs/test.galaxyproject.org/config/shed_tool_conf.xml">
        <tool file="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/f57c13f5878b/data_manager_dada2/data_manager/dada2_fetcher.xml" guid="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/dada2_fetcher/0.0.7"><tool_shed>toolshed.g2.bx.psu.edu</tool_shed><repository_name>data_manager_dada2</repository_name><repository_owner>iuc</repository_owner><installed_changeset_revision>f57c13f5878b</installed_changeset_revision><id>toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/dada2_fetcher/0.0.7</id><version>0.0.7</version></tool><data_table name="dada2_taxonomy">
            <output>
                <column name="value" />
                <column name="name" />
                <column name="path" output_ref="out_file">
                    <move relativize_symlinks="True" type="file">
                        <source>${path}</source>
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">dada2/${path}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/dada2/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
                <column name="taxlevels" />
            </output>
        </data_table>
        <data_table name="dada2_species">
            <output>
                <column name="value" />
                <column name="name" />
                <column name="path" output_ref="out_file">
                    <move relativize_symlinks="True" type="file">
                        <source>${path}</source>
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">dada2/${path}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/dada2/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
            </output>
        </data_table>
    </data_manager>
    <data_manager guid="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/data_manager/dada2_fetcher/0.0.1" id="dada2_fetcher" shed_conf_file="/cvmfs/test.galaxyproject.org/config/shed_tool_conf.xml">
        <tool file="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/bf7b2c14cabc/data_manager_dada2/data_manager/dada2_fetcher.xml" guid="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/dada2_fetcher/0.0.8"><tool_shed>toolshed.g2.bx.psu.edu</tool_shed><repository_name>data_manager_dada2</repository_name><repository_owner>iuc</repository_owner><installed_changeset_revision>bf7b2c14cabc</installed_changeset_revision><id>toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/dada2_fetcher/0.0.8</id><version>0.0.8</version></tool><data_table name="dada2_taxonomy">
            <output>
                <column name="value" />
                <column name="name" />
                <column name="path" output_ref="out_file">
                    <move relativize_symlinks="True" type="file">
                        <source>${path}</source>
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">dada2/${path}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/dada2/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
                <column name="taxlevels" />
            </output>
        </data_table>
        <data_table name="dada2_species">
            <output>
                <column name="value" />
                <column name="name" />
                <column name="path" output_ref="out_file">
                    <move relativize_symlinks="True" type="file">
                        <source>${path}</source>
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">dada2/${path}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/dada2/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
            </output>
        </data_table>
    </data_manager>

The correct version appears in the tool tag but not the data_manager tag. No idea if this is the problem, though.

natefoo avatar Jan 15 '20 21:01 natefoo

I fixed the version and it's the same thing:

galaxy.tools.data_manager.manager WARNING 2020-01-15 15:25:18,524 No values for data table "dada2_taxonomy" were returned by the data manager "toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/data_manager/dada2_fetcher/0.0.8".

natefoo avatar Jan 15 '20 21:01 natefoo

Hmm..strange. Thanks @natefoo for your help!

gregvonkuster avatar Jan 15 '20 22:01 gregvonkuster

Btw. new data_manager with silva 138 available

bernt-matthias avatar Mar 30 '20 09:03 bernt-matthias