lbann icon indicating copy to clipboard operation
lbann copied to clipboard

Share I/O thread pools between trainers

Open bvanessen opened this issue 5 years ago • 6 comments

If there are multiple trainers per node, it may make sense to share the I/O thread pool between trainers.

bvanessen avatar Sep 05 '19 20:09 bvanessen

It sounds like we want to make the I/O thread pool a singleton object.

timmoon10 avatar Sep 05 '19 20:09 timmoon10

@benson31 posted in #916:

Thread pools are not singletons. That would be a bad move. There can be many thread pools per rank/node/whatever. I/O is the shared resource. IMO, the I/O thread pool is an implementation detail of actually doing I/O and it shouldn't appear in the frontend at all.

timmoon10 avatar Sep 05 '19 20:09 timmoon10

In my current PR to add threading to data_store preloading, I create an ad-hoc thread pool that lives for the life of the call to preload_data_store(). Tom, I'm unsure what you mean by "shouldn't appear in the frontend" -- unsure what "frontend" entails.

davidHysom avatar Sep 05 '19 21:09 davidHysom

@davidHysom whatever you're doing is probably fine. The "frontend" is model_zoo/lbann.cpp. My main complaint is that the I/O thread pool is exposed to main() and then ferried around through the interface when it's really an implementation detail of reading data.

benson31 avatar Sep 05 '19 22:09 benson31

But again, to clarify my thought and reiterate @ndryden's original concern, the primary issue is not the threads themselves but rather the fact that these particular threads are doing I/O, which is a node-level resource (rather than process-level).

benson31 avatar Sep 05 '19 22:09 benson31

One should be careful with too many threads though: processor cores are also a shared resource, and threads are not free.

ndryden avatar Sep 06 '19 07:09 ndryden