jekyll_picture_tag icon indicating copy to clipboard operation
jekyll_picture_tag copied to clipboard

Implement a global threadpool for faster builds.

Open aebrahim opened this issue 1 year ago • 9 comments

Fixes #225

This implements a global Concurrent::ThreadPoolExecutor from concurrent-ruby to avoid memory blowup, and de-duplicates files before generation so we no longer need to rely on filesystem consistency to avoid double-generating images.

This is an alternate to #282 with a little bit more complexity, but with the added benefits that all image generation can happen in a single ThreadPoolExecutor.

aebrahim avatar Sep 22 '22 01:09 aebrahim

This is amazing work! It's not so complicated considering the benefits it provides. I like this version better, though it might be worth testing a few builds to make sure it's faster by enough to justify the added complexity.

I'm still on a work trip, get home Saturday, so the earliest I can pull this down and test it will be monday. Just from reading, my initial criticism is I don't super like the test_pool because I'd like the tests and production to be as similar as possilbe, and I'd like to minimize the amount of code in production which deals with testing. I haven't tried to find another way, so it might just be our only option.

The tests on this project are a bit of a hot mess (sorry about that, I had learning to do), but I'd like to see a few tests of this behavior as well. I can handle that part if you're not up for it. We should also add a configuration option to disable it, in case someone has issues. (This code gets run in all sorts of weird environments)

Again, thank you so much! this is as big of an improvement as switching from imagemagick to libvips, which was huge.

rbuchberger avatar Sep 22 '22 08:09 rbuchberger

Thanks for the kind words @rbuchberger! And thanks for this package - it's so useful!

I refactored the tests so I could remove start_test_pool - please let me know what you think when you get back. It's been a fun way to learn a little bit of ruby :P

aebrahim avatar Sep 25 '22 00:09 aebrahim

If I could request one quick improvement, as someone who's dorked around with threading this in the past for performance (your method is better than mine, for sure!):

Can the number of parallel threads be configurable in the config files somewhere? Or, if it already is, document how to adjust it. I use some systems that are high on CPU count to RAM ratios (one of my little ARM boxes is 6C/4GB), and some of the resizes can use enough RAM to really choke the rest of the system out. Thanks!

Syonyk avatar Sep 25 '22 22:09 Syonyk

This approach does not seem to actually work - it still creates one threadpool per tag. Back to the drawing board for a bit to make it a full global

aebrahim avatar Sep 27 '22 00:09 aebrahim

That's right - every {% picture mypicture.jpg %} called in a site creates a unique instance of JPT.

There's a context object that is handed around during the site build. You might need to store relevant information there, and use some hooks to ensure that threads are cleaned up.

rbuchberger avatar Sep 27 '22 09:09 rbuchberger

I the code working by changing start_pool to ensure_pool_started and stopping the pool with a hook:

Jekyll::Hooks.register :site, :post_render do
  PictureTag::Pool.stop_pool
end

However, I'm having a lot of trouble testing it like that. The hardest part for me has been figuring out how to stub an option globally - I added a threads option, but I'm not sure of a good way to consistently read it in test without race conditions.

aebrahim avatar Sep 28 '22 15:09 aebrahim

@aebrahim - We're out of my area of experience here - I've never played with multithreaded Ruby code. Closest I've gotten is asynchronous javascript.

Our tests are synchronous, so would calling ensure_pool_started during setup and stop_pool during teardown work? For tests which verify file presence, we'd have to call stop pool before looking for them, but there aren't a huge number of those.

Edit: I commented before reading your new commit, looks like that's pretty much what you're doing. Hmm, I'll have to do some learning here

rbuchberger avatar Sep 29 '22 07:09 rbuchberger

I suspect most of my problems are coming from trying to read parameters. Maybe I will update this to remove all the parameter reading code and see if I can get tests to pass that way, and then leave it for someone else to figure out how to read parameters? WDYT?

aebrahim avatar Sep 29 '22 16:09 aebrahim

Hmm... Not sure I follow - which parameters do you mean?

rbuchberger avatar Oct 04 '22 14:10 rbuchberger