ruby-ll icon indicating copy to clipboard operation
ruby-ll copied to clipboard

Port libll to pure Ruby; Opal support

Open hmdne opened this issue 1 year ago • 7 comments

I am tasked with porting some Ruby libraries to JavaScript by using Opal (an alternative Ruby runtime, like JRuby, but targeting JavaScript runtimes). Like JRuby, it doesn't support C extensions (or Java for that matter), so I had to port the extension to pure Ruby. This implementation may be also usable to make Ruby-LL work on other Ruby runtimes, like MRuby.

I have considered porting libll directly to JavaScript, but performance is of secondary importance for the project I'm working on, but I may later revisit the idea.

In addition, I added tests for running the test suite under Opal-RSpec, so that we can be sure that Ruby-LL works under Opal.

I have made a similar patchset for Oga, but it is still a work in progress.

This PR has been sponsored by Ribose Inc. ref: plurimath/plurimath#159

hmdne avatar Jun 09 '23 22:06 hmdne

Oh and FYI: I just pushed some changes to main to get JRuby properly running on the latest version, and to remove some ancient Ruby versions that were still teste against.

yorickpeterse avatar Jun 10 '23 02:06 yorickpeterse

Thanks for a great review, I will address those issues today.

Finally, you mentioned you're also working on porting Oga to Opal. Oga's codebase is much more complicated to deal with due to the use of Ragel, and I believe Ragel doesn't/no longer supports outputting Ruby code.

What I absolutely don't want is an entire manually written re-implementation of the lexer code in pure Ruby or Javascript, as that would be a pain to maintain if it ever needs changing; especially considering Opal isn't that widely used if I'm not mistaken. So if there's no way to reuse the Ragel grammar for Opal, I'm afraid I'd have to reject the changes for Oga.

Regarding Oga, I understand the intention for the lexer to be maintainable, that's why I didn't write a new one, but I wrote a simple regexp script that translates the C/Java hybrid code in base_regel.rl to Ruby. Perhaps that's not the best choice and I will try to find a better solution.

Regarding Ragel, there are multiple Ruby projects like parser that depend on Ragel 6 which supports Ruby. I remember Ragel 7 not supporting Ruby - but that's not a case anymore.

hmdne avatar Jun 10 '23 08:06 hmdne

It seems the two commits more or less depend on each other, with the second commit ensuring changes introduced by the first pass all tests. These should just be rebased into a single commit

I split it into two commits, because the first one introduces a pure-Ruby functionality and another one introduces Opal support.

hmdne avatar Jun 10 '23 08:06 hmdne

Regarding Oga, I understand the intention for the lexer to be maintainable, that's why I didn't write a new one, but I wrote a simple regexp script that translates the C/Java hybrid code in base_regel.rl to Ruby. Perhaps that's not the best choice and I will try to find a better solution.

That sounds extremely prone to error, so that's not something I'd merge.

Regarding Ragel, there are multiple Ruby projects like parser that depend on Ragel 6 which supports Ruby. I remember Ragel 7 not supporting Ruby - but that's not a case anymore.

Ah, indeed Ragel 7 dropped support for it at some point. Looking at the output of ragel --help it seems to support Ruby as output once more, though I have no idea if the resulting Ruby actually works. If so, adding pure Ruby support might not be that difficult.

I split it into two commits, because the first one introduces a pure-Ruby functionality and another one introduces Opal support.

As I mentioned, I don't want any flags that suddenly change the behaviour. I honestly don't see anybody that uses MRI wanting a pure Ruby version, given it will perform worse compared to the C extension. Rubinius isn't relevant either these days (and used to support the C extension just fine). As such, the pure Ruby version would only be used for Opal and I guess mruby, of which both cases are better served by checking RUBY_PLATFORM.

yorickpeterse avatar Jun 10 '23 12:06 yorickpeterse

That sounds extremely prone to error, so that's not something I'd merge.

The converter I wrote is simple enough, but it is regexp, so it's not entirely readable. I'm pasting it here, so that I could clarify the issue:

puts STDIN.read.gsub(/\/\*.*?\*\//m, "")
               .gsub(/if\s*\(([^}]*?)\)\s*\{([^}]*?)\}\s*else if\s*\(([^}]*?)\)\s*\{([^}]*?)\}\s*else\s*\{([^}]*?)\}/m, "if \\1\n\\2\nelsif \\3\n\\4\nelse\n\\5\nend")
               .gsub(/if\s*\(([^}]*?)\)\s*\{([^}]*?)\}\s*else\s*\{([^}]*?)\}/m, "if \\1\n\\2\nelse\n\\3\nend")
               .gsub(/if\s*\(([^}]*?)\)\s*\{([^}]*?)\}/m, "if \\1\n\\2\nend")
               .gsub(/if\s*\(([^}]*?)\)\s*([^}]*?);/m, "if \\1\n\\2\nend")
               .gsub("fc == '\n'", 'fc == "\n".ord')
               .gsub("lines++", "lines += 1")

The problem is quite simple. What I understand your intention behind base_lexer.rl is that it aims to use a common subset of languages so that code could be shared between them. Unfortunately, due to how Ruby works, the biggest difference is that it has a different syntax to if.

Alternatives that I considered:

  • convert base_lexer.rl to base_lexer.rl.erb that would use some kind of Ruby DSL to produce C/Java/Ruby syntax on demand
  • instead of a regexp approach, do a line-by-line rewriter that would act on lines starting with "if", "else", "else if", "{", "}", "/*"
  • wrap the language output part with some uniquely extractable prefix, like "[[[", "]]]", rewrite the code in Ruby, use parser gem to convert that to any language desired
  • instead of creating a Ruby implementation, create directly an Opal/JS implementation that due to us using JavaScript would be mostly compatible with C-like syntax
    • to clarify, a recent version Ragel 7 supports JavaScript too, Ragel 6 does not

Are any alternatives mentioned above acceptable for upstreaming into Oga? If yes, then which one will you prefer?

hmdne avatar Jun 10 '23 14:06 hmdne

@hmdne I have to be honest here, but this sounds like a wasted effort.

Using regular expressions may technically work now, but it's very prone to error, especially surrounding the use of whitespace. I really do not want such a solution (as in, I won't merge any such changes). The reasons for this are:

  1. Oga is pretty much in maintenance-only mode and has been for years, as I stopped working on it years ago. As such I'm really only interested in bug fixes, or trivial additions.
  2. Feature and support wise I also consider Oga to be finished. Sure, it could have been made faster or have a different API here and there, but I think that's better suited for a different library with a different approach.
  3. Any future changes to the lexers using the regex approach, should there be any, may break the code in non-obvious ways. Given the maintenance mode status, the absolute last thing I want to deal with is code breaking in non-obvious ways some time in the future. Ragel is already annoying enough to deal with, let's not make it worse by piling regular expressions on top of it.
  4. Oga has been around for 8 years, and I've never heard (or at least remember) of anybody voicing interest in using it with Opal. I'm guessing this is because most Opal users would just use the JS DOM APIs instead. Doing this now seems like it would cost a lot, but without any gain. In fact, I'm actually surprised Opal is even still around these days.

In other words: if the existing base lexer can't be re-used as-is, as is done with the Ruby and Java extensions, it's not going to fly, as it's not worth my time and energy. A Javascript lexer might work, but only if the changes in general are simple enough and don't complicate the maintenance process.

yorickpeterse avatar Jun 10 '23 15:06 yorickpeterse

Small note to add: Ragel 7 doesn't support Ruby output. The option is still listed, but when used the Ragel CLI throws an error. https://github.com/whitequark/parser/issues/317 contains some more details, and some linked issues where people ran into the same problem.

Instead, Ragel 6 must be used to build things.

yorickpeterse avatar Jun 10 '23 16:06 yorickpeterse