prism Bring back a configure script

Originally, we built prism with nothing but a plain makefile. This worked for a while, until we ended up doing some pretty nasty feature detection. Then we pulled in autotools to try to make this better. This worked for a while, but complicated the overall build system a lot. We then stripped it out and used a plain makefile and generally tried to simplify our usage of non-compatible functions/assumptions.

This has worked for a while, but https://github.com/ruby/prism/pull/2171 is making me rethink it. I would like us to be able to use a HAVE_MMAP function without having to define our own feature detection. Furthermore, this year we intended to investigate some SIMD instructions for our lexer, which means we're going to need even more feature detection. There are also specific GCC extensions that could be particularly useful for us, but that we can't use at the moment without hoping the macros reported are correct.

All of this is to say, I think we should bring back a configure script. It doesn't have to be autotools, but it would probably be a relatively low lift to get it back. Either way, it would be nice to be able to properly support non-POSIX and 32bit environments without forcing them to define a bunch of stuff.

I'm not planning on doing this soon because I don't want to screw with the build system before jruby/truffleruby's next release. But I would like this to be considered going forward.

Thoughts? @enebo @eregon

Jan 16 '24 18:01 kddnewton

I think first we should clarify what exactly we need to check, and check that it cannot be done via the preprocessor or directly in the source code with some sort of check.

For some esoteric architectures or operating systems I think it might be OK to require to pass some options explicitly. E.g. if some OS follows POSIX but doesn't implement some part of POSIX, it might be reasonable to require to opt-out of that functionality explicitly (e.g. with some -DNO_FOO) for such a case. Too esoteric might not be worth/too-difficult to support, because 1) it's hard to test 2) compilers there might be pretty buggy 3) it's hard to repro/debug.

Regarding SIMD, could this be checked via #ifdef and maybe attributes or some runtime call or so? Is it typical to check for SIMD stuff via ./configure-style "try to compile and see if it fails"? (I don't know) I recall for example this which sounds like some kind of feature checking within C. One idea for SIMD (I'm not sure if it's a common approach but maybe grpc uses that?) is to include code for various variants and select the right one at runtime because Prism might be compiled on one machine and used on another, notably this is the case when Prism is part of TruffleRuby. That argument actually reveals a flaw of ./configure in general, what features were checked at build time might not be available at runtime if run on a different machine (if less available at build time then it's "just" suboptimal, if more is available at build time then it likely doesn't even run on that other machine).

From my experience with autotools in Prism it was very heavy (e.g. many files to do very little, and another level of templating) and making the build system so much more complex, e.g. the simplification in https://github.com/ruby/prism/pull/1224/files#diff-d08667e429de34a404a1386c623776ff3d4cd98ee7f69250a364c61d02f20da1 was significant. In fact my impression was with autotools nobody fully understood the build system anymore and it was very hard to approach it. It was also significantly harder to integrate Prism in another software. Committing the generated configure script felt also pretty messy because it's huge unreadable generated code (but there was no choice since e.g. autoconf on macOS had various issues IIRC, e.g. autoheader not available there by default IIRC).

The heart of ./configure is just to check whether some piece of C code compiles or not, we could do that ourselves with a Bash script or so (or with some Ruby code maybe), and create a config.h file based on what's available or not. It does feel a bit like reinventing the wheel but it feels like it would be worth it compared to the autotools behemoth.

Jan 16 '24 20:01 eregon

I don't have a strong opinion on this but we don't immediately plan on supporting running prism on micro processors like cortex. I do have a need to compile a minimal "forgiving ... it works everywhere" build so not all JRuby installs will be required to compile something to use JRuby.

So there is a possibility that we MIGHT need overrides to not use features we can detect that are not on all machines for that arch (although pre-compiled binaries will probably only exist for linux, windows, and macos so I am not sure this is a real concern). For exotics we will be falling back on legacy parser and trying to see if we can make a nice universal option (like webasm) in the future so we can remove legacy altogether.

Prism is a gem and by default it does compile for the language library stuff. This will compile on all systems unless prism decides to pre-bundle on the bigger platforms. Our internal gem for making our serialized use of the parser will also be a gem but I am not sure if that will also allow compilation out of the gate (vs big 3 pre-compiled with failover to legacy).

How many C libraries/apps do not eventually need configure? It feels like people use it for a reason and that reason is inconsistency on esoteric platforms. MRI does run on a lot of platforms even if not many people use them. It really seems like there will be a point (and that might be now) where this will happen anyways.

When I see talk of rolling our own scripts and C snippets to not use what everyone else uses it really feels like it will lead to some sunk cost nih thing in the future. Most C people do use configure right? I would not say I am in the loop with C practices but I cannot remember the last time (other than prism) where I did not have to run configure to build something.

Jan 16 '24 20:01 enebo

@tenderlove do you have some thoughts on this?

Jan 19 '24 13:01 kddnewton

When I see talk of rolling our own scripts and C snippets to not use what everyone else uses it really feels like it will lead to some sunk cost nih thing in the future. Most C people do use configure right?

+1 on this. The reason I added autotools / configure was because we were reinventing the wheel. That said, I don't particularly care. If wheel reinvention becomes too onerous, we could just move to autotools.

One idea for SIMD (I'm not sure if it's a common approach but maybe grpc uses that?) is to include code for various variants and select the right one at runtime because Prism might be compiled on one machine and used on another

Usually in these cases you know what platform you're targeting though, right?

For SIMD stuff I don't think all processors implement all instructions (for example the different flavors of ARM), so if you're planning to compile on one machine but run on another I think you'd have to just give up on those optimizations (how would you detect a processor supports a particular instruction at runtime?)

Jan 20 '24 00:01 tenderlove

Let's find the embedding requirements based on https://github.com/ruby/prism/discussions/2217 It seems so far we can do with just macros and don't need configure-like detection.

how would you detect a processor supports a particular instruction at runtime?

Based on CPU flags, which can notably be seen via e.g. cat /proc/cpuinfo on Linux. This is how GraalVM finds out which SIMD instructions are available on AMD64 for instance. There might be other more direct ways as well.

Looking at https://github.com/grpc/grpc/blob/master/third_party/utf8_range/range2-sse.c and https://stackoverflow.com/questions/28939652/how-to-detect-sse-sse2-avx-avx2-avx-512-avx-128-fma-kcvi-availability-at-compile they do seem to use __SSE4_1__ and __AVX2__, which seem to not need a ./configure script but are always defined. For TruffleRuby we'd need to only define the minimum SIMD features and not more (e.g. currently GraalVM requires a minimum of SSE2 on AMD64), we can figure out the details when we get there.

Jan 22 '24 10:01 eregon

I'm going to close this until it becomes a problem. I'm hoping at some point it does, because it will mean we have things we want to support, but until then.

May 21 '24 14:05 kddnewton

prism prism copied to clipboard

Bring back a configure script

prism
prism copied to clipboard