DRAM icon indicating copy to clipboard operation
DRAM copied to clipboard

dbcan: ++ best hits, subfam ec numbers columns change cutoff to 1e-18

Open rmFlynn opened this issue 3 years ago • 3 comments

This is an attempt at a solution for some problems with CAZY ids from dbCAN.

  • We will introduce a new column for best hit from the dbCAN database named cazy_top_id this will be the match to the scaffold that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on the e-value. This new column cazy_top_id will be the only column considered in the distillate.
  • The default limit on the e-value will be raised to 1e-18 with the percent coverage limit remaining
  • Add a new column corresponding to EC number information from subfamilies, named cazy_subfamily_ec. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate.

rmFlynn avatar Mar 01 '22 22:03 rmFlynn

I will also try to address #162 in this release.

rmFlynn avatar Apr 07 '22 18:04 rmFlynn

I will also try to address #157 in this release

rmFlynn avatar Apr 07 '22 18:04 rmFlynn

In this release, the dbCAN subfamilies will be fully included in the distillate to facilitate more accurate sub casting. However, the subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.

rmFlynn avatar Apr 19 '22 18:04 rmFlynn