lbann
lbann copied to clipboard
Share I/O thread pools between trainers
If there are multiple trainers per node, it may make sense to share the I/O thread pool between trainers.
It sounds like we want to make the I/O thread pool a singleton object.
@benson31 posted in #916:
Thread pools are not singletons. That would be a bad move. There can be many thread pools per rank/node/whatever. I/O is the shared resource. IMO, the I/O thread pool is an implementation detail of actually doing I/O and it shouldn't appear in the frontend at all.
In my current PR to add threading to data_store preloading, I create an ad-hoc thread pool that lives for the life of the call to preload_data_store(). Tom, I'm unsure what you mean by "shouldn't appear in the frontend" -- unsure what "frontend" entails.
@davidHysom whatever you're doing is probably fine. The "frontend" is model_zoo/lbann.cpp
. My main complaint is that the I/O thread pool is exposed to main()
and then ferried around through the interface when it's really an implementation detail of reading data.
But again, to clarify my thought and reiterate @ndryden's original concern, the primary issue is not the threads themselves but rather the fact that these particular threads are doing I/O, which is a node-level resource (rather than process-level).
One should be careful with too many threads though: processor cores are also a shared resource, and threads are not free.