Better support for modular models
Is your feature request related to a problem? Please describe.
The current modularization methods are ambiguous and produce unneeded parameters. This is a complex issue and the point of this post is to capture some thoughts on how a bottom-up approach might help. The tables below will show the scope of information that needs to be handled in QSP or PBPK models.
Let's start with an example from the docs:
using Catalyst
function repressed_gene(; R, name)
@network_component $name begin
hillr($R,α,K,n), ∅ --> m
(δ,γ), m <--> ∅
β, m --> m + P
μ, P --> ∅
end
end
t = default_t()
@species G3₊P(t)
@named G1 = repressed_gene(; R=ParentScope(G3₊P))
@named G2 = repressed_gene(; R=ParentScope(G1.P))
@named G3 = repressed_gene(; R=ParentScope(G2.P))
@named repressilator = ReactionSystem(t; systems=[G1,G2,G3])
@show parameters(repressilator)
t = default_t()
@species P₁(t) P₂(t) P₃(t)
rxs = [(@reaction hillr($P₃,α,K,n), ∅ --> m₁),
(@reaction hillr($P₁,α,K,n), ∅ --> m₂),
(@reaction hillr($P₂,α,K,n), ∅ --> m₃),
(@reaction δ, m₁ --> ∅),
(@reaction γ, ∅ --> m₁),
(@reaction δ, m₂ --> ∅),
(@reaction γ, ∅ --> m₂),
(@reaction δ, m₃ --> ∅),
(@reaction γ, ∅ --> m₃),
(@reaction β, m₁ --> m₁ + P₁),
(@reaction β, m₂ --> m₂ + P₂),
(@reaction β, m₃ --> m₃ + P₃),
(@reaction μ, P₁ --> ∅),
(@reaction μ, P₂ --> ∅),
(@reaction μ, P₃ --> ∅)]
@named repressilator2 = ReactionSystem(rxs, t)
@show parameters(repressilator2)
produces 21 parameters in the modular method and 7 in the standard method:
parameters(repressilator) = Any[G1₊α, G1₊K, G1₊n, G1₊δ, G1₊γ, G1₊β, G1₊μ, G2₊α, G2₊K, G2₊n, G2₊δ, G2₊γ, G2₊β, G2₊μ, G3₊α, G3₊K, G3₊n, G3₊δ, G3₊γ, G3₊β, G3₊μ]
parameters(repressilator2) = Any[α, n, K, δ, γ, β, μ]
Describe the solution you’d like
Rather than removing redundant parameters posthoc, it would be better to never have them in the model. This requires a way to designate when a parameter is species-specific.
There are a large number of edge cases in QSP, PBPK, etc. that are difficult to anticipate. However, I think the tables below capture many of these cases. To implement a bottom up strategy, the wildcard character '@#' is replaced by building blocks that are used to construct equations and parameters via loops over building_blocks entries. I show the repressilator and a more complex PBPK model with flows, compartments and isotopes.
reaction_table = [
# Reactant, Product, Rate_type, Rate_eqtn_prototype, Building_blocks, Isotopes Compartment
(None, "m_@1", "HillR", ["P_@2","alpha","K","n"], [["1","3"],["2","1"],["3","2"]], None, None),
("m@1", None, "mass_action", "delta", ["1","2","3"], None, None),
("m@1", ["m@1","P_@1"], "mass_action", "beta", ["1","2","3"], None, None),
("P_@1", None, "mass_action", "mu", ["1","2","3"], None, None)
]
reaction_table = [
# Reactant, Product, Rate_type, Rate_eqtn_prototype, Building_blocks, Isotopes Compartment
("Leu", "APP", "mass_action", "k_f", None, ["U", "L"], ["ISF"]),
("APP", "C99", "MM", ["V_M2","K_M2"], None, ["U", "L"], ["ISF"]),
("APP", "C83", "MM", ["V_M1","K_M1","CI:[C99,KM5]"], None, ["U", "L"], ["ISF"]),
("C99", "@1", "MM", ["V_gamma_@1","K_M4","CI:[C83,KM3]"], None, ["U", "L"], ["ISF"]),
("C99", "C83", "MM", ["V_M5","K_M5","CI:[APP,KM1]"], None, ["U", "L"], ["ISF"]),
("C83", None, "MM", ["V_M3","K_M3","CI:[C99,KM4]"], None, ["U", "L"], ["ISF"]),
("@1", "@1_ex", "reversible_mass_action", ["k_ex_@1","k_ret_@1"], ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["ISF","ISF"]),
("@1", None, "mass_action", "k_BPD_@1", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["ISF"]),
("@1", "@1_cranial", "bidirectional_flow", "Q_glymph", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["ISF","cranial"]),
("@1_cranial", "@1_CV", "bidirectional_flow", "Q_osc", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["cranial","CV"]),
("@1_cranial", None, "unidirectional_flow", "Q_cranial", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["cranial","0"]),
("@1_CV", "@1_cranial", "unidirectional_flow", "Q_cranial", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["CV","cranial"]),
("@1_CV", "@1_SP1", "unidirectional_flow", "Q_LP", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["CV","SP1"]),
("@1_SP1", "@1_SP2", "unidirectional_flow", "Q_LP", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP1","SP2"]),
("@1_SP2", "@1_SP3", "unidirectional_flow", "Q_LP", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP2","SP3"]),
("@1_SP3", None, "unidirectional_flow", "Q_LP", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP3","0"]),
("@1_CV", "@1_SP1", "unidirectional_flow", "Q_SN1", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["CV","SP1"]),
("@1_SP1", "@1_SP2", "unidirectional_flow", "Q_SN2", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP1","SP2"]),
("@1_SP1", None, "unidirectional_flow", "Q_SN3", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP1","0"]),
("@1_SP2", "@1_SP3", "unidirectional_flow", "Q_SN3", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP2","SP3"]),
("@1_SP2", None, "unidirectional_flow", "Q_SN3", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP2","0"]),
("@1_SP3", None, "unidirectional_flow", "Q_SN3", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP3","0"]),
("@1_CV", "@1_SP1", "bidirectional_flow", "Q_osc", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["CV","SP1"]),
("@1_SP1", "@1_SP2", "bidirectional_flow", "Q_osc", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP1","SP2"]),
("@1_SP2", "@1_SP3", "bidirectional_flow", "Q_osc", ["Ab38", "Ab40", "Ab42"], ["U", "L"], ["SP2","SP3"])
]
Isotopes are interesting because they should have identical kinetics, so they should be handled separately from the Building_blocks. Compartments need some nomenclature to distinguish reactions within a compartment and reactions (or flows) that cause movement between compartments.
Conceptually, what is in the tables above is not that different from normal Catalyst. In fact I think Catalyst could be written like this:
function repressed_gene(; R, m, P, name)
@network_component $name begin
hillr($R,α,K,n), ∅ --> $m
(δ,γ), $m <--> ∅
β, $m --> $m + $P
μ, $P --> ∅
end
end
where the default is that parameters are NOT species-specific unless designated with some wildcard (not shown here) but could be something like δ_$m.
Describe alternatives you’ve considered
Antimony has the same behavior when constructing modular models.
A potential complication is illustrated by this building block: [["1","3"],["2","1"],["3","2"]]
This is intended to mean: loop through the list of lists and pass each list to the generating function. However there are cases where you want to have nested loops. There needs to be some other symbol for 'collections', like [{"1","3"},{"2","1"},{"3","2"}]. Everything in curly brackets is passed to the generating function at the same time.
There are cases where you want to loop over different states of species, like an oligomer. [["Abeta40","Abeta42"],[{M,O2},{O2,O3},{O3,O4}], etc.]. Additionally, it would be useful to allow comprehensions. As you can tell, this can get complex but it is good to plan for the edge cases.
Another possibility is to allow generation of strings containing reactions that can be passed to a Catalyst model-generating function. That is to say, just teach people how to generate complex, repetitive reactions programmatically rather than making modular models super flexible. Here's an example in Antimony. The reaction_table isn't used but suggests an alternative way to code repetitive reactions.
reaction_table = [
# Reactant, Product, Rate_type, Rate_eqtn_prototype, Comp
("{Abeta}_{O_n0}_{Comp}","{Abeta}_O1_{Comp}", "{Abeta}_{O_n1}_{Comp}", "reversible_mass_action", "[k_{O_n0}_{O_n1}_{Abeta}_{Comp},k_{O_n1}_{O_n0}_{Abeta}_{Comp}]" )
]
Abetas = ["AB42"]
Oligomer_n0s = [f"O{i}" for i in range(1,24)]
Oligomer_n1s = Oligomer_n0s + 1
Compartments = ["ISF"]
# Initialize string to store reactions
reaction_string = ""
for Comp in Compartments:
for Abeta in Abetas:
for (i,O_n0) in enumerate(Oligomer_n0s):
O_n1 = Oligomer_n1s[i]
# Add reactions to string with newlines
reaction_string += f"{Abeta}_{O_n0}_{Comp} + {Abeta}_O1_{Comp} -> {Abeta}_{O_n1}_{Comp}; k_{O_n0}_{O_n1}_{Abeta}_{Comp}*{Abeta}_{O_n0}_{Comp}*{Abeta}_O1_{Comp}\n"
reaction_string += f"{Abeta}_{O_n1}_{Comp} -> {Abeta}_{O_n0}_{Comp} + {Abeta}_O1_{Comp}; k_{O_n1}_{O_n0}_{Abeta}_{Comp}*{Abeta}_{O_n1}_{Comp}\n"
It kind of sounds like you are looking for a BNGL or Kappa like model specification language. You should be able to use BNGL to generate Catalyst compatible net files, or possibly SBML, via Bionetgen. Have you taken a look at it?
At some point it would be nice to add such functionality to Catalyst too, probably via metadata to allow for species and reactions with large combinatorial state spaces, along with corresponding DSL updates. But first we'd have to study BNGL and Kappa to see what has been worked out previously.
Concering the parameters, I think it might make sense to have some option to specify that a parameter should be global across all submodels in a hierarchical model. The two options I immediately can think about is to either:
Have a parameter metadata to designate this when it is declared
E.g. something like:
function repressed_gene(; R, name)
@network_component $name begin
@parameters K [modelwide = true] n [modelwide = true]
hillr($R,α,K,n), ∅ --> m
(δ,γ), m <--> ∅
β, m --> m + P
μ, P --> ∅
end
end
would specify that K and n uses the same single parameter across any hierarchical models with them. This should probably be coordinated with ModelingToolkit though.
Have an option when composing a model to designate this
I.e. some additional argument to compose
rn_base = @reaction_network
@named repressilator = compose(rn_base, [G1, G2, G3]; modelwide_ps = [:K, :n])
Isn’t that what the various MTK scoping decorators are for?
https://docs.sciml.ai/ModelingToolkit/stable/API/model_building/#ModelingToolkit.GlobalScope
Didn't know about those, but that sounds like what one should do then
It kind of sounds like you are looking for a BNGL or Kappa like model specification language. You should be able to use BNGL to generate Catalyst compatible net files, or possibly SBML, via Bionetgen. Have you taken a look at it?
This is a great discussion. I think there are many things to be learned from BioNetGen and Kappa. I feel like those are somewhat burdened by a lot of platform-derived tech debt and nearing end-of-life? I think if the goal is to generate SBML, I would go with Antimony and programmatically write the Antimony, E.g., I asked Gemini to write code to translate a bespoke JSON format to Antimony,
{
"Reaction_name": "Monomer Addition and Dissociation",
"Reactants": "[{Abeta}_{O_n0}_{Comp},{Abeta}_O1_{Comp}]",
"Products": "[{Abeta}_{O_n1}_{Comp}]",
"Rate_type": "RMA",
"Rate_eqtn_prototype": "[k_{O_n0}_{O_n1}_{Abeta}_{Comp},k_{O_n1}_{O_n0}_{Abeta}_{Comp}]",
"Comp": "[ISF]",
"Abeta": "[AB40,AB42]",
"O_n0": "[f\"O{i}\" for i in range(1,24)]",
"O_n1": "[f\"O{i}\" for i in range(2,25)]",
"Paired": "{O_n0,O_n1}"
}
that has all the features that I wanted (parameters are by default global, use of comprehensions, ability to specify that specific lists of species are linked or 'paired' typically in a reactant/product relationship). So the workflow was, custom code to generate antimony, antimony2sbml, SBMLImporter->Julia. The key here is that there are no 'modules' in the Animony model, all the equations are written out explicitly. I could do the same for a Catalyst model as well, avoiding modularization. In a certain sense the modularization is occurring outside of Antimony/Catalyst.
It sounds like ModelingToolkit.GlobalScope achieves the goal of not proliferating parameters in modular models, although there may be other complexities, e.g. with custom reactions and volumes for flow/clearance rates. Overall I think we are not too far away from DiffEq in a pdf/Excel/MathML -> model via LLM, so that's another discussion.
Feel free to close if you feel that the discussion has reached its end. I'm really glad that the Catalyst community and MTK are forward thinking and willing to have these discussions.
If you want to use loops and comprehensions to write reactions, why not use the Catalyst symbolic interface? If I’ve understood correctly, you aren’t using features of the reaction modeling language itself that support structured reactions over groups of species (as in BNGL), but instead just using the underlying programming language (Python for Antimony) to enumerate all reactions programmatically? That is exactly the context where it makes sense to use the Catalyst symbolic interface.
Is this more like what you want to do? Here all species and parameters are global, and loops are used to build structured reactions between species:
https://docs.sciml.ai/Catalyst/stable/model_creation/examples/smoluchowski_coagulation_equation/
PS - I don't think this discussion has reached its end -- adding more general model construction functionality to Catalyst is always the goal. There is just the question of what is wanted / needed, and what makes sense to support via the DSL vs. just directly using the symbolic interface. The symbolic interface will always be the most flexible model-specification level.
One place where we could make compositional modeling nicer, as in your original post, would be to support alias specification and removal, i.e. something like
function repressed_gene(; name)
@network_component $name begin
hillr(R,α,K,n), ∅ --> m
(δ,γ), m <--> ∅
β, m --> m + P
μ, P --> ∅
end
end
@named G1 = repressed_gene()
@named G2 = repressed_gene()
@named G3 = repressed_gene()
@parameters K
aliases = [G1.R ~ G3.P, G2.R ~ G1.P, G3.R ~ G2.P, G1.K ~ K, G2.K ~ K, G3.K ~ K]
t = default_t()
@named repressilator = ReactionSystem([], t, [], [K]; systems = [G1,G2,G3], aliases)
repressilator = remove_aliases(repressilator)
which would eliminate the redundant species and parameters, i.e. each Gi.R and each Gi.K, using the alias relations (or more concretely, turn them into observables). The main issue is we would need some basic checks, e.g. ensuring the aliases are not circular.
aliases = [G1.R ~ G3.P, G2.R ~ G1.P, G3.R ~ G2.P, G1.K ~ K, G2.K ~ K, G3.K ~ K]
This is what I was trying to avoid. Keeping track of what to eliminate isn't very scalable. We are also interested in large models (hundreds of equations) wherein keeping track of observables that aren't needed is wasteful.
Also the programmatic generation of the Smoluchowski equations is very similar to what I'm doing in the examples above, it's just that the examples are in Antimony right now. I think my challenge was to think about how to have static code for programmatically generating ReactionSystems with a really well-designed, comprehensive parameter/reaction input format that guides the programmatic generation (see the JSON example).
I would love to use the Catalyst symbolic interface but do not want the aliases and subsequent observables. So far I've noticed great performance from MTK with hundreds of equations but I don't want to push my luck. I agree that GlobalScope and ParentScope in MTK are promising. I played with them a while back and it does start to become a logic problem.
There is the possibility that the best gateway for complex models into Catalyst/MTK is through SBML, and then how you write the SBML is not that important or at least a separate issue. We are exploring a lot of options for writing SBML, including writing libSBML programmatically. We have an ongoing project that will open source a bunch of PKPB/QSP models and it should be a good test case to think about all the edge cases in these types of models.