prawn Add TruffleRuby head to CI

This is a clean version of #1189 against current master head. Thanks to @eregon for the original submission.

I have two concerns with this PR:

Truffleruby appears to produce a different hash than either MRI ruby or JRuby for the manual
Truffleruby takes ~7 minutes to run specs that take ~10 seconds in MRI. That seems excessive.

Posting this PR for discussion. @eregon @pointlessone @gettalong any thoughts are welcome.

Feb 09 '22 23:02 petergoldstein

I would rather not add TruffleRuby unless I have commitment from someone to support it. TruffleRuby is an exciting development but I don't know if it has a solid userbase to warrant the effort. I don't know anyone who uses it in production and the CI overhead is substantial.

Feb 10 '22 07:02 pointlessone

@pointlessone I think that's reasonable. @eregon are you interested in signing up for that? Do you have any idea why the tests are so slow for Truffleruby?

Feb 10 '22 08:02 petergoldstein

The TruffleRuby team will monitor the CI status via https://github.com/eregon/truffleruby-gem-tracker and help if it fails (like I mentioned on https://github.com/prawnpdf/prawn/pull/1189#issuecomment-761567300).

I don't know why the tests are slower. It'd be interesting to run RSpec in a mode where it measures each test's time, to see if it's a particular test being slower. Could also produce a flamegraph by passing --cpusampler=flamegraph to TruffleRuby. @bjfish Maybe you can help and look into this?

Feb 10 '22 10:02 eregon

If there are concerns about support, TruffleRuby could still be added to CI as experimental (failures allowed for now)

Feb 11 '22 10:02 johnnyshields

GitHub Actions actions is not great for optional build steps. We can't have a green build with some steps failing. It makes the whole contribution UX much worse.
Waiting for the slow non-essential step is suboptimal. GH is adamant about waiting for the build completion to switch PR UI.

In short, adding TruffleRuby doesn't benefit Prawn much.

Feb 11 '22 11:02 pointlessone

If the CI time is a concern (it's 6min30s in this run, still sounds fairly fast):

We could try to make it faster by investigating and e.g. using --engine.Mode=latency
It could be a daily or weekly job instead, so then it doesn't affect CI time for PRs

Feb 14 '22 15:02 eregon

@eregon The big concern is the comparative time. Note that the truffleruby "Run tests" section takes over 5 minutes to run. As compared to ~15s for each of the MRI rubies. That's especially odd given that one of Truffle's main selling points is speed.

The scheduled job is a good suggestion though.

Feb 14 '22 15:02 petergoldstein

It's expected more optimizing Ruby JITs run test suites slower (e.g., similar on JRuby), because tests constantly run different code and so rarely run any existing & JIT-compiled code. And the interpreter for a Ruby with a JIT needs to collect profiling data which costs some time, as well as the JIT consuming CPU time (especially on machines with few cores). There are ways to improve this, including persistent the JITed code, and using TRUFFLERUBYOPT=--engine.Mode=latency which optimizes startup/warmup over peak performance. The slowdown shouldn't be that big though, and @bjfish is currently looking into that.

I think the scheduled job is a good solution, especially for truffleruby-head, it doesn't get in the way but still regularly produces CI results and we can easily find out if it got red and regressed somehow.

Feb 14 '22 18:02 eregon

@eregon So I can understand that, but as you note it really shouldn't be that big. A factor of 20 is pretty substantial. Would be interested to hear the cause. I've been unable to get truffleruby running locally, so I haven't been able to narrow down the slowdown.

@pointlessone If it sounds good to you, I can change this PR to create a scheduled CI job (daliy? weekly?) that includes truffleruby-head. Let me know.

Feb 14 '22 18:02 petergoldstein

From the flamegraph it seems most of time (31.2% using Search) is spent in Prawn::Images::PNG#split_image_data and specifically in StringIO#read and String#byteslice called from there. The main cost is probably scanning the substring to compute the correct coderange (7-bit or valid for BINARY strings). This should improve when TruffleRuby has adopted TruffleString (WIP). It also indicates that in this case it might be better to compute the coderange lazily, since it's one of the rare cases where the coderange doesn't seem needed (for color.write data.read(color_bytes)). From a wider perspective it's an unfortunate case of Ruby not having a real "byte array" structure, and String being a relatively poor fit for it. Let's track this as a TruffleRuby issue: https://github.com/oracle/truffleruby/issues/2599

Feb 15 '22 12:02 eregon

@petergoldstein We already have a weekly schedule for CI. I'm fine including TruffleRuby in it. I'd skip it in PR builds until its run time would be comparable to at least JRuby.

@eregon Thank you for looking into it. Feel free to propose improvement that might also benefit other implementations too.

Feb 15 '22 12:02 pointlessone

@pointlessone This splits out truffleruby into a scheduled (and also manual, upon request) workflow. There's some duplication because we're reusing the push/pull_request CI config for the normal scheduled one. So I simply added a truffleruby only config in another file.

Feb 23 '22 01:02 petergoldstein

I took another look at the performance issue (https://github.com/oracle/truffleruby/issues/2599) and I think I've found the culprit now. I'll try to fix it soon.

@petergoldstein Could you rebase this PR and address https://github.com/prawnpdf/prawn/pull/1246#discussion_r812994292 ?

The test suite runs in 1min24s for me locally so I feel it's more than fast enough for a scheduled CI job. So maybe we can merge this PR?

Dec 12 '22 15:12 eregon