GECKO icon indicating copy to clipboard operation
GECKO copied to clipboard

Add better support for multiple "base" models in the code by adding an Adapter pattern

Open johan-gson opened this issue 3 years ago • 7 comments
trafficstars

We would create a base class called ModelAdapter that has functions such as "getParameters" and "getSpontaneousRxns". The GECKO code will then take an adapter (which is a class inheriting the base class), and simply ask the adapter for the parameters etc. This way, the code will not be written specifically for Yeast-GEM, but can support any model (well, maybe not any model, but at least the ones of interest) by simply adding an adapter to that base model. This pattern is already implemented in Gecko Light, it is called SpeciesAdapter there.

I hereby confirm that I have:

  • [X] Done this analysis in the master branch of the repository
  • [X] Checked that a similar issue does not exist already

johan-gson avatar Jun 23 '22 19:06 johan-gson

I like it is posible to adapt GECKO easily to work with any organism or strain. To use gecko you need to make many changes in documents or adapt some functions. I think that in order to make GECKO more user friendly, this approach can be used to define a directory where the files (abs_proteomics, chemostav, fermentationData, uniprot database) are located (e.g. parameters.databaseDir = "/"). Then, the functions that require it go directly to that directory.

Another option is to create a function that copies the files from a user-defined directory to the respective location in GECKO and then restore the original files after it is finished.

ae-tafur avatar Aug 03 '22 08:08 ae-tafur

define a directory

This is the approach used in ecModels, where for each organism there is a dedicated folder, in which the input is stored in scripts/ and the output of GECKO is saved in model/.

mihai-sysbio avatar Aug 03 '22 08:08 mihai-sysbio

Great ! Reviewing the repo, however, I think that

  1. It should be a GECKO .m function.
  2. It is used to create an ecModel (using enhance.m), but GECKO also allows to integrate proteomics data which also requires to make adjustments.

ae-tafur avatar Aug 03 '22 09:08 ae-tafur

The downside of such an approach is that we will get many copies of some parts of the code, one for each organism, which will generate much more code to maintain. I also generally dislike replacing the package code depending on which organism is targeted.

MATLAB is pretty weak on handling user-defined software packages, but I still think we should treat GECKO as such a package. If you think of for example an R package - did you ever encounter one where you were asked to replace the code in the package? You normally don't care about where the package is stored, and you never touch the code when you use the package, only when you are developing the package (bugfixing, new versions, etc.). Some users may not even have write access. If you just supply a path to a directory where some of the files are replaced, I don't think there is another way to solve that but to just replace them by overwriting the package? To me this is a weird solution, although it technically works. It works pretty ok on the server where models are generated automatically - that is a very controlled environment - but on peoples computers, especially when you work with multiple species at the same time, I think this adds unnecessary complications.

The adapter is the same thing as you propose, but with the advantage that there is no need to copy files, manipulate paths, or whatever method would be used. Don't focus on which functions are available now, that will change, it is just an example. There is of course less flexibility this way. By overwriting files, you can really change anything - with the adapter, you have to decide that in advance - when you need to change something new, you need to add a new function in the adapter, which means a new version of GECKO.

I think the adapter is a better way to go. There's nothing stopping us from making specialized functions for different species regardless, but for the ones that are called from within the GECKO pipeline, they must exist in the adapter base class.

If we should stick with the copying of files strategy, which I don't recommend, it will be important that this part is automatic in a way so it can be applied by a matlab call in each project, a call such as "adaptToModel(dir)" that is always called in the analysis to ensure that the right code is called. This is a bit complicated - it would first need to replace the files with the original GECKO files to get rid of any previous adaptations from other species, and then replace the "overridden" files. But I don't think we should go there.

johan-gson avatar Aug 03 '22 11:08 johan-gson

Just to clarify, I'm all for the adapter pattern.

I guess what I'm implying with the folder structure is to maintain the logic of separating input and output, but the input instead of being code separate seemingly haphazardly across different files, to use the adapter pattern instead. In practice, one way this could work is to keep each adapter listed in PR #168 in the "input" folder, together with whatever other code or data that is expected.

mihai-sysbio avatar Aug 03 '22 11:08 mihai-sysbio

I totally agree with Johan that currently you have to replace/adapt too many functions, and this generates or can generate changes and cause "mess", even if you are working with S. cerevisiae organisms but the proteomics data and others change, you have to change parameters or files that GECKO has by default.

But I think that you should not necessarily copy and replace files, just specify a directory where files or specific functions for each organism will be stored (input) and one where the results will be (output), so the folder Models and Database are example files to run GECKO.

ae-tafur avatar Aug 03 '22 12:08 ae-tafur

I agree that it would be good to keep a standard structure per base model, where we place the files etc. Maybe I just misunderstood you altogether :)

johan-gson avatar Aug 03 '22 13:08 johan-gson

Implemented in #168 and #180, further related discussion on location of custom files, functions and scripts in #187.

edkerk avatar Dec 19 '22 16:12 edkerk

@johan-gson Instead of having to set the correct ModelAdapter as parameter for running particular functions, is there not a way that this can be set as some global variable, so it just needs to be run once for a MATLAB instance? Or should we maybe have the name of the ModelAdapter in a model.ec.modelAdapter field, so as long as the model is used inside a function, the modelAdapter does not need to be specified manually?

Just trying to think of how to make things simplest for the user :).

edkerk avatar Dec 20 '22 22:12 edkerk

I found a way to do this, will implement and make PR :)

edkerk avatar Dec 22 '22 13:12 edkerk

Addressed in #191

edkerk avatar Dec 22 '22 21:12 edkerk