SLiM icon indicating copy to clipboard operation
SLiM copied to clipboard

allow extensions to Eidos/SLiM via some sort of plugin system or similar

Open bhaller opened this issue 3 months ago • 12 comments

This issue originates from slim-discuss at https://groups.google.com/g/slim-discuss/c/9KrBreuMW6A/m/z_UwIFTYCAAJ. @npb596 wants to incorporate an ODE solver into Eidos for use in a simulation of the evolution of biological networks. At present doing that requires modifying one's own copy of SLiM, and sharing such a solution with others would require making a public fork of SLiM. Neither of those is ideal.

Some kind of plug-in system, with loading of extensions via dynamically loaded external libraries, would provide a more graceful solution. However, this is obviously complex to do, and would be complicated to make work cross-platform.

Another possibility would be a mechanism like Rcpp that (I gather, I've never used it) allows the user to supply C++ code that is compiled on demand to add new functionality into R at runtime. Also complex and difficult to make cross-platform!

I'm marking this long-term for now. @npb596 has indicated that he'd be up for implementing something in this area, but only if demand warrants it. So, if anybody reading this issue would want to use a facility like this, please comment below so we can get a sense of the level of demand. :->

bhaller avatar Dec 06 '25 17:12 bhaller

I certainly would make use an ODE solver/plug-in if it was present in SLiM, and am also compiling custom builds of SLiM (but mostly just to add more tagF slots for saving multiple state variables more efficiently, so wouldn't warrant a plug-in)

kevolve avatar Dec 07 '25 22:12 kevolve

I certainly would make use an ODE solver/plug-in if it was present in SLiM, and am also compiling custom builds of SLiM (but mostly just to add more tagF slots for saving multiple state variables more efficiently, so wouldn't warrant a plug-in)

Thanks for the input. I like the idea of including a built-in ODE solver in Eidos, based on the GSL. Let's see what @npb596 says about it! :-> Happy holidays!

bhaller avatar Dec 07 '25 23:12 bhaller

Hi everyone, I wanted to share an idea I had reading the comments (although it’s not really related to an ODE solver, that’s something I have not used for it now).

I wonder if SLiM could solve this sort of problem in the (very) long term, taking inspiration from Stan (https://mc-stan.org/). This idea is very tightly related to https://github.com/MesserLab/SLiM/issues/431 also.

In both cases, it’s a domain-specific language heavily inspired by the R system (not that it matters), with some keywords that correspond to some blocks that must be present in every program (e.g. initialize, …) and a codebase written in C++. I find the approach of the Stan software very appealing. Executing any Stan program has several steps:

First, the Stan program is translated to C++ (and I assume it is relatively easy because functions are strongly typed and there’s a 1-1 match between Stan pseudo-language and the C++ codebase). Then, the C++ compiler compiles all C++ sources and links them together. Finally, one can run the executable as a regular binary (with minimal command-line arguments like seed). The way they allow for user extensions or plugins is then fairly simple (from the SLiM developer perspective). They simply have a --allow-undefined flag that allows functions that are not defined to be included in the C++ autogenerated code when translating from Stan pseudo-code. It is then the responsibility of the user to provide a suitable header_file.hpp.

Let’s say it’s possible to write such a C++ program that calls equivalent Eidos functions as a regular SLiM script. Then, it would be fair to say the more general way of allowing users to use external C++ code, get compiled SLiM code, and still be cross-platform is to simply rely on some C++ compilers (as they are all designed with that purpose).

Of course, this is dreaming. I know almost nothing about the SLiM code base, but I’m assuming translating a SLiM script into adequate C++ code (from the SLiM codebase) would be very difficult. Still, I would be very much interested in something like this and happy to play with this if others are willing to participate.

I imagine a (separate) slimc program that takes a SLiM script and translates (a subset of possible) SLiM scripts into C++ code. I imagine people doing ABC stuff would benefit from even small performance improvements, and others with very complex simulations that require C++ code.

currocam avatar Dec 08 '25 08:12 currocam

Hi Ben and everyone else,

Sorry to disappoint but I believe I was mistakenly tagged here. The person who made the thread https://groups.google.com/g/slim-discuss/c/9KrBreuMW6A/m/z_UwIFTYCAAJ is a "Nick O'Brien" (I'm Nick Bailey) and I have no familiarity with C++ or ODE solving! After some internet digging I'm taking a guess the right person is @nobrien97 (https://github.com/nobrien97).

npb596 avatar Dec 08 '25 14:12 npb596

Hi Ben and everyone else,

Sorry to disappoint but I believe I was mistakenly tagged here. The person who made the thread https://groups.google.com/g/slim-discuss/c/9KrBreuMW6A/m/z_UwIFTYCAAJ is a "Nick O'Brien" (I'm Nick Bailey) and I have no familiarity with C++ or ODE solving! After some internet digging I'm taking a guess the right person is @nobrien97 (https://github.com/nobrien97).

Ah, sorry Nick! That was careless of me. Thanks for clearing it up!

bhaller avatar Dec 08 '25 14:12 bhaller

Hi everyone, I wanted to share an idea I had... Executing any Stan program has several steps:

First, the Stan program is translated to C++... Then, the C++ compiler compiles all C++ sources and links them together... a --allow-undefined flag that allows functions that are not defined to be included in the C++ autogenerated code when translating from Stan pseudo-code. It is then the responsibility of the user to provide a suitable header_file.hpp.

...Of course, this is dreaming. I know almost nothing about the SLiM code base, but I’m assuming translating a SLiM script into adequate C++ code (from the SLiM codebase) would be very difficult. Still, I would be very much interested in something like this and happy to play with this if others are willing to participate.

...I imagine a (separate) slimc program that takes a SLiM script and translates (a subset of possible) SLiM scripts into C++ code. I imagine people doing ABC stuff would benefit from even small performance improvements, and others with very complex simulations that require C++ code.

Hi @currocam! Yes, this kind of thing is pretty much what https://github.com/MesserLab/SLiM/issues/431 is about, so discussion of it should probably happen over there. But I agree that these two issues are tightly linked, and that probably only one of them needs to be fixed – in other words, SLiM needs some way to provide user-defined functionality that is compiled, whether through plug-ins or an Eidos/SLiM compiler or similar.

I think more thought is definitely warranted here, regarding which angle of attack is best and how difficult this would be to do. It sounds like Stan was expressly designed for the workflow you describe, and so translating it into C++ is trivial. That is not the case for Eidos, and while it would certainly be possible to translate Eidos code into C++ it's not clear how much of a performance win would really result since that compiled code would still, in some way, have to deal with Eidos's dynamic typing and so forth.

I'm interested in seeing something in this area happen, but I don't have the cycles for it right now. I need to complete the big projects that are already on my plate, and then maybe this will become the next big project after those. Hard to predict; I'm talking maybe two years from now. There's a lot to do. :-> If someone / someones want to try to drive these ideas forward, a good place to start would be writing up some kind of design proposal that considers both a plug-in & dynamic linking approach and a compile-to-C++ & static linking approach to the problem, getting into the weeds a bit to really try to figure out which one would work better for SLiM and why. (A third possibility is some kind of byte-code compilation for Eidos, and/or some kind of just-in-time compiler. There are many ideas in this space, and I don't know enough to have a sense of when different approaches are preferred.) Doing a deep dive like that would require quite a lot of effort, I think; it might be more than fits into a "side project" sort of slot.

One thing I would say is that it if you're bottlenecked inside the Eidos interpreter itself, running some complex algorithm that requires complex Eidos code, then perhaps a plug-in architecture makes more sense than compiling Eidos to C++. This is because, as I note above, the design of Eidos itself isn't really very close to that of C++, and compiled Eidos code might not run that much faster than interpreted Eidos code due to, e.g., dynamic typing. What you really want is to get your complex algorithm out of Eidos-land into C++-land entirely: you want to be using native C++ data types with strong typing. That's how you get fast. So to me that suggests that a plug-in architecture might be the better way to go. Then – just as for the built-in functions and methods in Eidos – you can check types once at the top, and then drop down into pure C++ for speed. That would be more difficult to achieve with an Eidos-to-C++ compiler, perhaps.

bhaller avatar Dec 08 '25 14:12 bhaller

@currocam Pondering this a bit more, a major objection I have to the Stan approach is that it would be difficult/intimidating for users with little-to-no programming experience, which is a substantial subset of SLiM's user base. I don't want people to have to deal with compiling their SLiM script before running it, by default. A more complex workflow like that should be optional, needed only for doing advanced things. It would also be difficult to integrate required compilation of SLiM models into SLiMgui, whereas I can imagine that a well-designed plug-in architecture might work seamlessly inside SLiMgui as well.

bhaller avatar Dec 08 '25 15:12 bhaller

Hi all,

I think the GSL ODE solver will work well - I've created a new issue (#582) to track that one. Happy to contribute to implementation as well. I think the plugin system might still be useful outside of the ODE solver (and I think some of the work on the solver might be useful for a plugin system down the track...), but again, it depends on demand as to whether I can justify putting time towards it :)

The Stan comparison is interesting, but I think it is mainly a different problem with some overlap. The primary goal of the plugin system in my view would be to allow developers to provide extensions which are impossible in Eidos without needing to maintain a fork. It could also be used to allow users to enable/disable features that might otherwise clutter SLiMgui (e.g. a plugin for extra graphing utilities, or more debug information etc. etc.). Maybe one of the reasons that a C++ extension is needed is because of performance concerns, but it could also just be a novel feature that requires an Eidos API implementation and doesn't make sense to include in the main SLiM project.

nobrien97 avatar Dec 09 '25 02:12 nobrien97

@bhaller thanks for the thorough response. Yes, I agree. I think you made very good points. Perhaps, I disagree in that I think the subset of the user base willing to use some C++ plugin à la Rcpp are most likely already quite comfortable compiling some binary.

About the plugin, I'm working recently mostly on ABC-like algorithms, and my go-to is to output a tree-sequence and preprocess later. There's definitely an overhead on that approach: writing the tree-sequence into a temporary file, reading it in memory in a Python session, and removing it. I have found I/O bottlenecks when having many processes in parallel with short simulations. I didn't think about it before, but I guess these sorts of applications could benefit from some plugin (based on the tskit C-library, for example) that computes the ARG-stats directly. All this is assuming that SLiM has the tree-sequence loaded in memory and shares the same data-structure. Do you think that would be possible?

currocam avatar Dec 09 '25 08:12 currocam

Perhaps one last thing. If at some point you want to consider using JIT, @arzwa implemented some JIT forward-in-time simulations in Julia with tree-sequence recording. It would be interesting to do some fair benchmarking to see how much someone can really hope to gain using JIT versus current Eidos.

https://github.com/arzwa/Fwd

currocam avatar Dec 09 '25 08:12 currocam

@bhaller thanks for the thorough response. Yes, I agree. I think you made very good points. Perhaps, I disagree in that I think the subset of the user base willing to use some C++ plugin à la Rcpp are most likely already quite comfortable compiling some binary.

Yes, I agree with that. My comment was with respect to what you wrote about Stan: "Executing any Stan program has several steps..." (emphasis added), where the steps involve translating the Stan code into C++ and compiling it. I took that to mean that adopting that approach in SLiM would imply the same: executing any SLiM model would start with translating the Eidos code into C++ and compiling it. That would, as you say, be fine for the subset of users comfortable with plug-ins, etc., but it wouldn't be fine for the rest of the SLiM world. We need a solution that can continue to work as it does now, with a simple command-line tool with no external dependencies, for users that are not using the additional functionality under discussion.

About the plugin, I'm working recently mostly on ABC-like algorithms, and my go-to is to output a tree-sequence and preprocess later. There's definitely an overhead on that approach: writing the tree-sequence into a temporary file, reading it in memory in a Python session, and removing it. I have found I/O bottlenecks when having many processes in parallel with short simulations. I didn't think about it before, but I guess these sorts of applications could benefit from some plugin (based on the tskit C-library, for example) that computes the ARG-stats directly. All this is assuming that SLiM has the tree-sequence loaded in memory and shares the same data-structure. Do you think that would be possible?

Most of the time, SLiM only has a "table collection", not a tree sequence. Building a tree sequence from a table collection involves various steps like sorting the tables, deduplication of some entries, etc., and usually (but not necessarily) simplification as well. Those steps are quite slow, which is why saving out a tree sequence from SLiM is slow. It would be possible for SLiM to save out the table collection without doing those steps, but tskit would require those steps to be taken on the Python side before building a tree sequence, so there's not really a win there, I think; better to only write out data structures that are actually valid, all else being equal. Somewhere (in a tskit issue, maybe?) I think the possibility of piping a tree sequence directly from SLiM into tskit, without needing the intermediate filesystem representation, has been discussed; but I think the speed savings would probably be quite small, IIRC it was discussed chiefly as a way to decrease filesystem clutter when the .trees file was just acting as a momentary intermediary between the two programs. But yes, if you're running many jobs in parallel on a node and they're all trying to write/read .trees files all the time, there could be a bottleneck there for sure. If you're interested in seeing a "direct pipe" type of solution from SLiM into tskit, I'd suggest that you open a new tskit issue about it, I guess. I can't find where that previous discussion is; maybe it never made it into an actual issue. But it won't get rid of the overhead of producing the tree sequence from the table collection, it will only get rid of the filesystem overhead. And yes, you could make a plugin based on tskit's C interfaces (which are already compiled into SLiM, in fact) to do analysis directly in SLiM, but again, that wouldn't get rid of the overhead (in time and in memory) of making the tree sequence object. That doesn't seem like it needs a new issue for now; this issue about plug-ins can encompass it, until such time as plug-ins are perhaps possible in general. :->

bhaller avatar Dec 09 '25 14:12 bhaller

Perhaps one last thing. If at some point you want to consider using JIT, @arzwa implemented some JIT forward-in-time simulations in Julia with tree-sequence recording. It would be interesting to do some fair benchmarking to see how much someone can really hope to gain using JIT versus current Eidos.

https://github.com/arzwa/Fwd

Sure, that might be interesting to see. Not sure how comparable it would really be, though, since what is actually being done in those simulations is doubtless different from what SLiM is doing. In a typical simple WF simulation in SLiM the Eidos overhead is basically zero, so I think the performance differences you might observe would probably not be due to any Eidos-vs-JIT comparison...?

bhaller avatar Dec 09 '25 14:12 bhaller