CellMLToolkit.jl icon indicating copy to clipboard operation
CellMLToolkit.jl copied to clipboard

Tracker for Coverage over CellML Model Repository

Open anandijain opened this issue 3 years ago • 12 comments

This issue will track our progress in testing CellMLToolkit.jl on the CellML Model Repository.

I have a branch that I've added some functions to query the Model Repo for all of the "exposures" and then curl them here. Additionally I added some functions to create a DataFrame to see which models work and which don't, here.

This work is incomplete and since the model repository is quite large, it takes a while to download.

We are planning to do something similar for SBML.jl and their test-suite so it'd be nice to have some consistency in testing.

I don't have the entire library, but from my sample of ~1000 models, I found that we can call solve on about 10% of these models and get back a Solution.

@shahriariravanian you've mentioned some of the issues that could be contributing to this 10% number. It would be good to mention them, so that as they get fixed we can see how this percentage changes.

anandijain avatar Mar 21 '21 18:03 anandijain

2.1.0 is giving ~178/940

anandijain avatar Mar 26 '21 08:03 anandijain

2.2.0 is giving ~477/940

anandijain avatar Mar 27 '21 03:03 anandijain

It is a known issue that some files in the CellML Model Repository have bad XML or do not fit the specification of CellML we use. (aside @shahriariravanian which version of CellML are we guaranteeing should work?)

removing Goldbeeter_2006 from my data folder we now get. The problem is caused in EzXML, where if it hits an error in parsing, it pushes to a global error stack that prevents further usage. why they do this, I have no idea...

530/940

anandijain avatar Mar 27 '21 20:03 anandijain

861 CellML models 718 successfully converted to ODESystem 635 successfully converted to ODEProblem 595 successfully solved

we get 940 from the curls, but cloning the git repos returns 861, so that's where that discrepancy comes from 595/861 is quite good IMO. as a lot of the models are truly defective

anandijain avatar Mar 31 '21 17:03 anandijain

What are the issues you see?

ChrisRackauckas avatar Mar 31 '21 17:03 ChrisRackauckas

this data is from @shahriariravanian. could you shed some light on chris' question?

anandijain avatar Mar 31 '21 17:03 anandijain

The remaining issues are:

  1. Some CellML XML files are defective (missing some initial values). Currently, CellMLToolkit throws an error for these. However, the plan is to return a list of uninitiated variables for the user to provide the values.
  2. Some models have more than one iv (in fact, some use partial_diff tag). This is uncommon in CellML models but is supported in the specs.
  3. The main remaining active tissue is to implement imports completely. Currently, we have an incomplete implementation. Full import is rather complicated, as CellML XML files can recursively import and rename components and connections (links between variables from different components) from other files. Because of the connections, we may need to import some components implicitly.
  4. The ODEProblms which were not solved are not a big problem, as we used a fixed solver (TRBDF2) with some default parameters.
  5. Large models (XML size > 500K) can take a long time to generate an ODESystem. I'm going to profile and see where the main problem is, but we may need to change the strategy in how to use structural_simplfy for the very large models.

shahriariravanian avatar Apr 01 '21 23:04 shahriariravanian

Great, could you name a model with ? I'd like to look into that. Similarly for a model with missing vars and components.

Also, if you end up doing some profiling, I think it'd be good to add benchmarking to our testing of the model repo. I'm happy to add this too with BenchmarkTools.

This may help pin down inefficiencies, ie "is it dependent on parameter count, state count, etc... ?".

anandijain avatar Apr 01 '21 23:04 anandijain

This is the results of the latest run:

# outcome
867 CellML models
6 too large (>500K, excluded)
744 successfully converted to ODESystem
650 successfully converted to ODEProblem
608 successfully solved

shahriariravanian avatar Apr 02 '21 12:04 shahriariravanian

Here is the result file as a CSV file. The res col codes are:

0 -> fail to generate ODESystem 1 -> fail to generate ODEProblem 2 -> fail to solve ODEProblem 3 -> success! 9 -> too large a file, ignored

cellml_results.txt

shahriariravanian avatar Apr 02 '21 12:04 shahriariravanian

Try setting the runner to a lower tolerance. That should help the domain error cases. If not, generate sqrt -> sqrt(abs so step rejects don't error out but instead reject.

ChrisRackauckas avatar Apr 02 '21 12:04 ChrisRackauckas

These are the latest tracking results using ver 2.4.1 (to be pushed soon):

# outcome
867 CellML models
6 too large (>500K, excluded)
775 successfully converted to ODESystem
688 successfully converted to ODEProblem
643 successfully solved

cellml_results_8.txt

shahriariravanian avatar Apr 10 '21 12:04 shahriariravanian