bgcflow icon indicating copy to clipboard operation
bgcflow copied to clipboard

BiG-SLICE and BiG-SCAPE integration strategy

Open OmkarSaMo opened this issue 2 years ago • 2 comments

For projects with too large number of BGCs, BiGSLICE is the preferred choice. However, I wanted to discuss a potential strategy to run BiGSLICE first to get first degree of GCF annotation and then run BiGSCAPE on each of the GCFs to get second degree of GCF annotation.

This strategy needs further discussion on which of the 1st-degree GCFs to select for 2nd-degree GCF annotation.

  1. Shall we run BiGSCAPE on every BiGSLICE detected GCF?
  2. Shall we run BiG-SCAPE on a few of the most abundant GCFs only (e.g GCFs with around 50 BGCs or more)? And then run a BiGSCAPE on all remaining BGCs together.
  3. Shall we include all of the MIBIG DB for each of the BiGSCAPE run?
  4. Do we need this feature as an additional rule of BGCFlow?

Please drop other considerations that would be relevant for this strategy- @matinnuhamunada

OmkarSaMo avatar Dec 14 '22 15:12 OmkarSaMo

Here it is best to stick to query-bigslice results for combining the two outputs.

  1. We could run BiGSCAPE on each of the BiGFAM GCF with more than 50 BGCs to get BiGSCAPE defined sub-GCFs (50 is parameter that can be changed)
  2. We could combine all the smaller GCFs with less than 50 BGCs and run BiGSCAPE on these.
  3. We could run a separate BiGSCAPE on all BGCs that had no membership assigned

OmkarSaMo avatar Jan 23 '23 12:01 OmkarSaMo

@matinnuhamunada - we should build on something similar to what I started to propose here.

OmkarSaMo avatar Jul 11 '23 09:07 OmkarSaMo