OpenABM-Covid19 icon indicating copy to clipboard operation
OpenABM-Covid19 copied to clipboard

Added optional support for using stats library instead of GSL

Open adamfowleruk opened this issue 2 years ago • 1 comments

PR for discussion and visibility in the group. Some formatting and style changes may be required. Please do let me know.

(Note: Although the feature name is 'armadillo', this change does not introduce that as a dependency at this time as it wasn't required for the functionality used by the simulation.)

I have abstracted the statistical library used into its own header and implementation files controlled by the Makefile options. This allows selectively choosing stats libraries as needed. This could be for licensing needs, or the performance characteristics of certain statistical functions, including support for multi-core processors in future.

On licensing

GSL is GNU GPL v2 licensed, restricting who can use OpenABM-Covid19. The changes in this PR allow it to be dual licensed as GPLv2 or Apache-2.0 depending on the compiler options chosen. This is accomplished by optionally choosing the stats library instead of the GSL library as compilation options. They also potentially allow other statistical libraries to be used easily in future too. (I have not changed the license wording to a dual license in this PR, as this is a secondary benefit).

On the stats library

The stats library can be found here: https://github.com/kthohr/stats This has a dependency upon the GCEM library also found here: https://github.com/kthohr/gcem Once these are extracted into the OpenABM-Covid19 folder, they can be used.

README.md has been updated to reflect this new compilation option.

A GSL_COMPAT option is also provided so that the marsenne twister used is a 32bit variety with the same seed as GSL, to allow for direct comparison of results. Note: Not every function in the stats library works exactly the same as their GSL equivalent even with the GSL_COMPAT flag. These differences are noted in comments in random_stats.cpp.

Results

Performance results on my test machine using the test_performance.sh script showed:-

  • Existing GSL build with previous options (-O0) took 263 seconds
  • Stats library inlined using 32bit RNG and GSL_COMPAT settings (-O2) took 202 seconds
  • Stats library inlined using 64 bit RNG, no GSL COMPAT, -O2 with -fopenmp with a single change to model.c (commented out in this commit) took 178 seconds (A 32% saving)

Future benefits this commit would also allow

With further changes a greater degree of parallelisation will be possible using this PR as a basis. The stats library has built in support for OpenMP allowing multi threaded execution over vectors and matrices with minimal work as its a compiler feature.

The stats library allows the use of a 64 bit marsenne twister rather than 32 bit. This is potentially an issue given its extensive invocation in a large network relationship simulation like OpenABM-Covid19. (The performance test example uses the random number generator 2.17 billion times for a population of 1 million. 32 bits provides a space of 4.23 billion.)

The stats library also allows the use of other vector and matrix libraries such as Armadillo, Blaze, or Eigen. This vector approach may also provide performance to R and Jupyter notebook integrations in future too.

Separate to the use of the stats library, I have provided implementations of ran_discrete and ran_shuffle that are at least as fair as those in GSL, but more performant given how OpenABM-Covid19 uses GSL. My shuffle implementation executes in 46% less time, and my discrete implementation executes in 90% less time. I have not replaced the GSL calls with these, but we may wish to do so in future.

adamfowleruk avatar Jan 21 '22 21:01 adamfowleruk

@roberthinch I've ensured the python swig wrapper compiles now, but the laptop I have here doesn't have Docker and so for some reason the data_tests files seem to be missing. Please can you test again and let me know the output. I also think my renaming of network.network to network.next (to remove C++ compiler warnings when used with the stats library) may have broken the R bindings, but I cannot find the R wrapper functions at fault. Any hints welcome. I've limited the R and python bindings to GSL only for now so the tests at least pass. Thanks.

adamfowleruk avatar Feb 07 '22 15:02 adamfowleruk