bgcflow
bgcflow copied to clipboard
Integrated GTDB summary table
We should have a summary table that ensures we have GTDB assignments for all genomes.
Currently, the df_gtdb_meta table does not have information for genomes missing from the database. For example, projects after qc step don't have the info on GTDB for all genomes in the processed table.
It will also be good to just create a new integrated table that will have information from both GTDB and GTDB-tk (from the rule or provided file). This table can have only the columns describing the taxonomic levels instead of all the extra columns provided in df_gtdb_meta.csv file.