Parallelize txnsim()

Open jaymon0703 opened this issue 4 years ago • 0 comments

Prompted by this blog post referencing the txnsim() function we started chatting again about parallelizing the function, so creating an issue for the work with some notes from the discussion...mostly pointers from Brian.

txnsim constructs random trades...  the time spent is actually in the generation of the random draws... the actual P&L calculations in blotter are all vectorized and fairly trivial

so the time in txnsim is actually in constructing the draws

you would gain some advantage in breaking that up for a large simulation, e.g. have each core work on 1/ncores of the draws, but then you would probably still have blotter mark them all the way it does now

we use lapply to build the list of trades

could switch to parLapply

as much as I prefer the flexibility of foreach, it probably doesn't make sense to rewrite the lapply as a foreach loop, though that would work too... it is a more complicated refactoring

we also call replicate on the inner loops...  that would actually be a better place to refactor as a foreach loop

hah!  Richard McElreath already has a parallel replicate in the code for statistical rethinking

or could use mclapply which would only use one core on Windows but would serve them right for running an operating system that doesn't support forking
(of threads)

Roger Peng's book 'R Programming for Data Science' also talks about parallel replicate().  the book is worth owning in print, but the whole thing is online here:
https://bookdown.org/rdpeng/rprogdatascience/

Jul 15 '21 13:07 jaymon0703