nokogiri
nokogiri copied to clipboard
explore: take advantage of modern CRuby GC features
After reading Peter's Adventures in Ruby: Garbage Collection in Ruby - Peter Zhu:
Can we use rb_gc_mark_movable instead of rb_gc_mark?
I think we can.
And after peeking at Gradual Write-Barrier Insertion into a Ruby Interpreter:
Can we take advantage of WB protection?
I need to learn more here, but:
a) it doesn't look like we can make Node or Document objects use WB, because they are T_DATA and so are always created with wb_protected=false; and (from the linked doc above) "WB-unprotected objects can not become WB-protected objects."
b) we've already (long ago) removed RB_ARRAY_PTR in favor of rb_ary_entry so I'm not sure we're unprotecting anything we don't need or intend to unprotect
Things I'd like to do before we start working on this, though:
- ship v1.11.0 (see milestone)
- maybe ship v1.12.0 with Nokogumbo (see #2064)
- figure out whether our Valgrind suppressions are actual problems or not (I'm starting to suspect they might be)
- investigate and close all the existing memory issues (see label
topic/memory) - resurrect the GC memory test suite, and add more coverage to it (https://github.com/sparklemotion/nokogiri/issues/1603)
So anyone who's interested in this topic, I'm open to learning, but I'm unlikely to merge anything until and unless all of the above are addressed first.
cc @tenderlove to tell me how I'm incomplete or wrong here
I migrated ruby-pg from Data_Wrap_Struct to TypedData_Wrap_Struct a.k. typed-data recently here: https://github.com/ged/ruby-pg/pull/349
This changed all T_DATA objects to typed-data, implements the GC.compact callbacks and removed all rb_gc_mark calls.
I did not make use of write barriers so far, since I didn't estimate a big performance advantage, but more complexity.
We implemented TypedData_Wrap_Struct in #2579 which I think took care of all the compaction support. Given Lars's comment above about write barriers, I'm going to deprioritize this work and close this issue.
If there's GC-related work anybody thinks we should do, please comment and we can re-open or open a new sisue.