truffleruby icon indicating copy to clipboard operation
truffleruby copied to clipboard

Get common C-extensions to declare `rb_ext_ractor_safe` or `rb_ext_thread_safe`

Open mohamedhafez opened this issue 1 year ago • 19 comments

In planning ahead to the near future when TruffleRuby can run C-extensions marked with rb_ext_ractor_safe(true) or rb_ext_thread_safe(true) in parallel, I've been raising issues in common gems to try to nudge them to make their C-extensions Ractor-safe (most already are) or threadsafe, and then submitting PR's to mark them as such. This issue will track the progress of all those issues.

Gems that will need some work:

  • https://github.com/sparklemotion/nokogiri/discussions/3283 Possibly some changes needed to conditionally use threadsafe features of libxml if it's using a new enough version. Testing with --cexts-lock=false doesn't have any issues on my app personally though, so a promising sign at least

Reviewed for Ractor-safety/Thread-safety, PR submitted none currently


PR merged! 🎉

  • https://github.com/ruby/ruby/pull/11762 (rbconfig-sizeof)
  • https://github.com/bcrypt-ruby/bcrypt-ruby/pull/280
  • https://github.com/trilogy-libraries/trilogy/pull/200
  • https://github.com/puma/puma/pull/3486
  • https://github.com/socketry/nio4r/pull/320
  • https://github.com/ruby/rbs/pull/2041

No interest shown from maintainers, and not very important anyway

  • https://github.com/ruby/syslog/issues/10
  • https://github.com/brianmario/mysql2/issues/1374 (we already have trilogy, TruffleRuby users can always migrate to that)

mohamedhafez avatar Sep 19 '24 16:09 mohamedhafez

👋 Nokogiri maintainer here, I'm a Truffleruby contributor and supporter. Nokogiri is a large library, but I would like to make it better support ractors, so please consider my request in that issue to be: what specific use cases should we start with? If you're a user of ractors I'd love to chat with you about your concrete needs.

flavorjones avatar Sep 19 '24 20:09 flavorjones

@flavorjones So I'm not using Ractors, but in JRuby I have multiple threads repeatedly scraping the account pages of substitute teachers looking for newly posted available jobs at their school district (with the explicit, contractual permission of all parties involved). Each thread has it's own Nokogiri object, and is processing different document objects. I want each thread to be able to run with true concurrency, in parallel on different cores of a CPU, without locking globally each time a call is made into the Nokogiri C-extension (as is currently the case when I run the code with TruffleRuby).

In C-Ruby, I would be trying to accomplish the exact same thing but with Ractors instead of threads. However that is not possible because Nokogiri objects can't be used inside of a Ractor currently and throws an error if you try (see https://github.com/sparklemotion/nokogiri/discussions/3283#discussioncomment-9974494)

mohamedhafez avatar Sep 20 '24 05:09 mohamedhafez

@eregon did TruffleRuby end up moving to a ripper implementation based on prism, or is there more clarity about whether such a move is planned? Just wondering if I can remove it from the checklist.

Also just wanted to verify that running rb_ext_ractor_safe(true) c-extensions in parallel is still on the roadmap!

mohamedhafez avatar Mar 18 '25 19:03 mohamedhafez

did TruffleRuby end up moving to a ripper implementation based on prism

No, it didn't, we tried but we found it rather incomplete. It needs changes in Prism.

Also just wanted to verify that running rb_ext_ractor_safe(true) c-extensions in parallel is still on the roadmap!

Yes, I have a WIP PR for it: https://github.com/oracle/truffleruby/pull/3814

eregon avatar Mar 18 '25 19:03 eregon

It looks like ActionView uses ripper in Rails < 8.0 (e.g. https://github.com/rails/rails/blob/v7.2.2.1/railties/lib/rails/source_annotation_extractor.rb), i believe as part of the process of rendering ERB files, so perhaps it's worth the effort to review that for Ractor safety and get it marked as Ractor-safe if possible?

mohamedhafez avatar Mar 23 '25 17:03 mohamedhafez

I don't know if you have seen but https://github.com/oracle/truffleruby/issues/2136 is done now, so truffleruby-dev can run C extensions in parallel.

Regarding nkf and debug I think it doesn't really matter, those can execute with a global lock.

eregon avatar May 15 '25 12:05 eregon

@eregon I did see it! 🙌🙌 Amazing news, excited to give things another try when the next release comes around!!

Ok so regarding this issue: I've removed debug, but regarding nkf, my understanding is that while getting it to be Ractor-safe would require upstream fixes that are unlikely to happen, we might already be at the point we can declare rb_ext_thread_safe(true), right? If so, maybe I should change the description of this issue to be "Get common C-extensions to declare rb_ext_ractor_safe OR rb_ext_thread_safe", since either would accomplish what we want here?

mohamedhafez avatar May 15 '25 18:05 mohamedhafez

excited to give things another try when the next release comes around!!

It would be useful if you try before the next release, e.g. with truffleruby-dev, so if there would be any issue there is time to fix it before the release.

regarding nkf, my understanding is that while getting it to be Ractor-safe would require upstream fixes that are unlikely to happen, we might already be at the point we can declare rb_ext_thread_safe(true), right?

I'm not sure, we would need to look in more details at nkf and that's not really a priority, part of the idea is that C extensions that are not really maintained and are not ractor/thread-safe like nkf just run with a lock and that's fine. An example is some methods in Etc are not Ractor-safe, and running with a lock is fine and likely as fast as manually locking in that extension. It doesn't impact other extensions which are marked as rb_ext_ractor_safe() or rb_ext_thread_safe(), those still run in parallel. So it would only matter if your application spends a lot of time in a non-ractor-safe non-thread-safe extension, which seems unlikely for nkf.

eregon avatar May 16 '25 11:05 eregon

Sure, that's understandable if nkf isn't used much and isn't well maintained (my app uses it on the critical path but I'm pretty sure I can work around that for my specific case). I've removed it from this issue. Should I remove syslog as well from here in the same vein?

There also doesn't seem to be any interest from the maintainers for https://github.com/brianmario/mysql2/issues/1374, so maybe I should remove mysql2 as well since users can just use trilogy instead, which did accept a patch?

mohamedhafez avatar May 16 '25 16:05 mohamedhafez

I think there is no harm to leave those issues opened and linked here, so fine as-is. Maybe just under a different section in the description to make it clearer they are not so important?

eregon avatar May 19 '25 11:05 eregon

@eregon done! So i guess the only thing really left to do here for TruffleRuby maintainers is threadsafety for ripper, since that's used in .erb rendering in Rails it might actually be pretty important to do i'm guessing?

mohamedhafez avatar May 20 '25 14:05 mohamedhafez

Is Ripper used every time a view is rendered? I would hope not, it's a lot of allocations and pretty slow. Probably it's done once per view, maybe lazily or during app startup?

Ripper is likely not thread-safe or Ractor-safe. It's also effectively deprecated now that there is Prism.

eregon avatar Jun 04 '25 14:06 eregon

@eregon I see, so should I remove ripper from here since it's not worth the effort to fix and won't make much difference anyway? (Assuming it really doesn't get used every time for erb rendering - a quick chatgpt o3 query says your assumption is indeed correct)

mohamedhafez avatar Jun 04 '25 15:06 mohamedhafez

@flavorjones any luck getting to the point where Nokogiri releases the GVL when it calls into libxml2? As far as TruffleRuby, we don't actually need it to be Ractor-safe, we're really just looking to be able to run it in parallel, so that would solve our needs perfectly fine.

mohamedhafez avatar Jun 04 '25 18:06 mohamedhafez

Yeah, I think Ripper is low priority given it's rarely used on hot paths, and would likely be high effort so likely just not worth it. Also I recall something like recent Rails might already use Prism instead of Ripper.

We might try to use Prism::Translation::Ripper at some point but it's not a priority because it is some work and we'd probably have to effectively maintain it if we are the main/only user of it. Also it has some level of incompatibility with regular Ripper, at least in the current state.

eregon avatar Jun 05 '25 10:06 eregon

any luck getting to the point where Nokogiri releases the GVL when it calls into libxml2

I am not opposed in theory to adding support to Nokogiri for releasing the GVL, but I have no plans to work on it myself.

I think any such work would need thoughtful design (because, for example, libxml2 has a lot of features that can call back into Ruby and would therefore need to reacquire the GVL). There may be fundamental design changes that need to be made and most definitely there will be API design decisions that need to be made.

If this is something you are passionate about, you are welcome to play with Nokogiri yourself and start driving the broader technical and API design discussion in a dedicated issue on the Nokogiri project. I'd be happy to answer any questions you have there.

flavorjones avatar Jun 05 '25 13:06 flavorjones

@eregon ah you're right about prism in rails, it looks like in Rails >= 7.2 it uses prism where available! I must have missed that since i'm still stuck on 7.1 due to that being the latest version JRuby supports. I've removed ripper from this issue.

mohamedhafez avatar Jun 05 '25 16:06 mohamedhafez

@eregon So the only thing we have left here is nokogiri! It sounds like ractor support, or nokogiri releasing the GVL before calling in libxml2, are a ways off and need significant work. However, since nokogiri already configures libxml2 with --with-threads, do you think it might be a good candidate for the new rb_ext_thread_safe(true)? On my app at least it seems to work fine with the C-extension lock turned off, so I'm hoping so.

@flavorjones: if libxml2 didn't have to reacquire the GVL when calling back into Ruby (since TruffleRuby doesn't need it for Ruby code, unless there is global state), AND we have a version of libxml2 installed that supports multithreading, AND since nokogiri already configures libxml2 with --with-threads... do you think it would probably work fine if TruffleRuby ran the nokogiri c-extension multithreaded, or do you think there other showstoppers just off the top of your head? On my app at least it seems to work fine with the C-extension lock turned off. We'd obviously need to review the code thoroughly to be certain, but just hoping for a positive signal from someone that knows the codebase!

mohamedhafez avatar Jun 05 '25 16:06 mohamedhafez

@mohamedhafez This feels like the wrong place to discuss about nokogiri, let's continue in https://github.com/sparklemotion/nokogiri/discussions/3283 and/or I would suggest to make a PR to nokogiri using rb_ext_thread_safe(true). TruffleRuby never acquires a global lock for Ruby code, so that part is not a concern. What really needs review is if nokogiri uses any C global (static) variables in the C extension, maybe you can search for those? https://github.com/oracle/truffleruby/blob/master/doc/user/thread-safe-extensions.md documents all conditions to mark an extension as thread-safe/Ractor-safe.

eregon avatar Jun 06 '25 08:06 eregon

A note about all those merged PR's: they are all part of new releases of their gems now, except bcrypt. Created an issue at https://github.com/bcrypt-ruby/bcrypt-ruby/issues/285 a few weeks ago, if anyone wants to nudge them to make a release 🤞

mohamedhafez avatar Jun 26 '25 13:06 mohamedhafez