problem-solving The development experience around the generate-and-export pattern has shortcomings

Background

Raku lets the programmer get involved in compile time in various ways. One of those is by being able to write modules that are given arguments, which are then received by sub EXPORT. It can then use the arguments it is given in order to dynamically produce things to export. Effectively, then, we have parametric modules.

One module doing this is ASN::META, which takes an ASN specification file and uses the MOP to produce types to export. At $dayjob we're also working on a module that lets you take a SQL file with comments that define a name and signature; the module parses that and exports a sub for each SQL query, so you can then call it just like a normal Raku sub.

In both cases, we can thus - in theory at least - can get the usual range of Rakudo compile-time checks done (fairly limited today, though hopefully more extensive in the future). Further, IDEs doing introspection to offer things like auto-complete, parameter hints, etc. from code in modules should also largely "just work" with such generated things.

A different case, but not entirey unrelated, is Cro templates, which are also compiled into Raku code. I've been working on template modules recently, so we can provide a bunch of Cro template subs/macros that make it more convenient to use, for example, Bootstrap. Those also largely use the normal module export mechanism, although a bit less directly. (Currently it's done by producing code that is then EVAL'd, but in the future, when we have a Raku AST (along the lines of QTrees in the 007/Alma research work), it'll be able to produce synthetic QTrees instead.)

There's lots of opportunity here, but a couple of small things missing that would help.

Dependency issues

If you do something like:

use MagicSqlThingy %?RESOURCES<sql/foo.sql>;

Then the module containing this is precompiled, and gets the exported subs. (Similar for ASN::Meta, except it's types that are exported.) In an installation situation, there's no problem, because installed things are immutable. But at development time, I might change sql/foo.sql without changing the module that uses it. Then the changes will go undetected. The options are:

Add no precompilation (not ideal in larger projects, where precomp really helps development times)
Make a dummy change in the module with the use (too easy to forget)
Nuke lib/.precomp (the worst of both worlds...why am I even suggesting this?)

No way to identify the source

It'd be good if there was a standard means to say "I'm a Raku sub/method, but I was actually generated from this file/line". That way, tooling that supports navigating the codebase is able to take the developer directly to the ASN spec, or SQL query, etc. Further, it would be generic: if somebody else was to produce a module doing similar things, tooling would be able to automatically offer this kind of support.

Constraints

I'd be fine with a solution that requires that the sources in question are resources. I don't think that's too much of a limitation, given one'd need to do that anyway for the code to be installable using standard tooling.

Feb 07 '20 15:02 jnthn

Sounds like we want to be able to use resource files as dependencies for precomp files in some cases. Off the top of my head I can't think of anything that prevents us from adding that. The whole precomp machinery is pretty agnostic to file contents, except to generate SHA hashes of course. Key points are CompUnit::DependencySpecification and ignoring non-module dependencies in CompUnit::PrecompilationRepository::Default!load-dependencies

Feb 07 '20 15:02 niner

@niner Thanks, glad that it seems in the realm of feasibility. :-)

Any thoughts on how that relationship between resource files and precomp files could be established?

Feb 07 '20 16:02 jnthn

I guess after a reasonably small bit of refactoring (splitting of emit-dependency from try-load) we could arrive at something like:

    my $resource = %?RESOURCES<foo.txt>;
    $resource.repo.precomp-repository.emit-dependency(
        CompUnit::PrecompilationDependency::File.new(
            :id($resource.precomp-id),
            :src($resource.relative),
            :checksum(nqp::sha1($resource.slurp(:enc<iso-8859-1>))),
            :spec(CompUnit::DependencySpecification.new(:short-name($resource.relative))), # or something like that
        )
    );

which we of course can simplify a bit by adding relevant parts to Distribution::Resource and CompUnit::PrecompilationRepository::Default. Then we'd have to flag this as a non-module dependency, so we will not try to load it in !load-dependencies

Feb 07 '20 17:02 niner

I'm reminded of a similar problem I've seen in the past that is related to this one regarding is native traits, which suffers from a similar problem. There, though, it's not %$RESOURCES people want to use but %*ENV. I could see someone trying %*ENV here too. If the environment changes since compile, your use will also stop working as expected and probably in a really headache inducing way.

I'm not sure %?RESOURCES is the only likely case here, is all I want to point out. I haven't really throught about how you might address that though.

Feb 18 '20 01:02 zostay

A broad way of re-stating this issue is compile time dependencies outside a codebase proper but which affect the compliation product. An analogous (note I said analogous, not equivalent) problem/solution situation is of course the evolution of the C language compiler tool-chain. As originally developed, the compiler itself knew nothing about pre-processor directives. The C macro pre-processor substantially modifies the code seen by the compiler compared to the source file the programmer prepared. One of the headaches of being a C programmer was maintaining an idea of what the the pre-processor would do to the the code visually in front of you and then debugging the pre-processor output when it didn't do what you expected. Headaches, yes. When you invoked the C compiler, it was really a driver program such as cc or gcc, which in turn organized the CLI calls to pre-processor, compiler phases, linker, dah, dah phases. Among other things the purpose of the driver program was to understand about some things outside the scope of the compiler phases to keep them focused and easy to maintain. Later C compilers would change to single pass and some might have folded pre-processor function in. Beyond the scope of the driver program the MAKE tool evolved to deal with arbitrary compile-time dependency that could be encoded as file timestamp relationships.

From my Perl 4/5 days one could influence Perl compile phases with BEGIN and INIT blocks, etc. and I used those features. I don't know the details of the Raku implementation yet. My suggestion here is there should be some thought as to driver scope versus compiler scope versus compiler phase scope and analysis presented of how much complexity you want in a given scope and whether there are some dependency situations you shouldn't attempt to handle in the compiler driver function but somehow handle in a way that isolates that complexity point. Just because you can, possibly simply, take care of a particular type of dependency mechanism in the compiler doesn't mean you should. With inattention to scope, the compiler then becomes the compiler crufted up with out-of-scope-because-we-never-discussed-scope non-compiler stuff that was convienent to stuff in there over the years and now its rigid, bloated, arthritic and falling over and has to be re-factored.

In the above original issue post and then the example by zostay, we have two very different dependency situations both of which can break because of a key thing you need for any compilation decision short-cut: is there something that acts as a persistent memory of the previous complilation pass against which I can compare the current situation for a decision?

A) Non-code filesystem object as a compile-time input. The original issue rasied is a compile-time dependency on a filesystem object outside the instant Raku codebase, meaning the object does not contain Raku code and thus is rationally outside the scope of a code compiler function. In that example, the filesystem object has a timestamp that could be reasonably understood to represent the last time the object contents were statically modified, so a timestamp-encoding of dependency could work. But suppose that filesystem object is a named pipe or some other object that could be a compile-time input source to a Raku module but the object timestamp is wholy independent of the age of the content supplied by the object? A time-stamp comparison scheme doesn't work then. Niner's proposed solution idea above then needs to have associated doc which describes for which types of non-code filesystem objects it works, that it (maybe?) uses timestamp comparison and that it doesn't work with any other object types. You might question, why would a Raku programmer use a named pipe for compile time inputs? Silly. Famously, programmers in the wild will always try stuff the developers never think about but nonetheless Raku's name is on the failures they see.

B) %*ENV as compile-time input. Assume a specific subset of %*ENV not everything. Well, unless the controlled subset in both development and production includes an enforced, configuration-controlled timestamp or version or counter or other unambigouous indicator of quantitized code change the production user can't bolox up then there is nothing guarrenteed in %*ENV and using it as compile-time input is risky and leaves nothing for data protection backups of the code base to capture. There is also no memory mechanism for %*ENV such as files have with persistent timestamps. Maybe not a good idea to use %*ENV. But if you must, then the closure model could be useful. Compile time becomes closure-time when referenced (K,V) tuples from %*ENV get pseudo-injected as code or otherwise captured in to the compilation result for later visibility (and debugger visibility). In this sense the compilation result in production uses the at-closure-time %*ENV rather than production-time %*ENV. Ideally accomplished by a use'd module rather than compiler modification.

Regarding the original scenario (A), if the existing Raku precomp function does use timestamp comparison source module to pre-comp product to decide re-compilation, then I suggest a simpler solution not requiring a Raku mod is to use a pass of MAKE or similar to detect that foo.sql is more recent than pre-compiled-MagicSqlThingy.raku then touch (update) the timestamp of MagicSqlThingy.raku to be more recent. No actual change to the byte content of MagicSqlThingy.raku needed if the touch functionality is available. You are not using MAKE to drive a complex compliation (with associated complex makefile maintenance headache), just using it to update a few file timestamps to get the dependency behavior you want out of Raku as is. I suggest this is perfectly acceptable in a development environment. I suspect that what you really are doing is using sql/foo.sql as a single master for input both to Raku compilation AND as input to a SQL interpreter for testing. So another tactic is to keep the master sql code content in either a .raku file structured as Raku code or a .sql file then use a little scipt invoked by, yes, MAKE, to spit out the dependent side (a .raku or a .sql) since you are running MAKE anyway. If you don't like that, try master_sql.txt and a script that spits out both .raku and .sql. This kind of stuff has had a rich solution set since the release of Seventh Edition Unix without trying to re-invent it by increasing the complexity of the Raku compiler.

However, that was only part of what you wanted. You also wanted information embedded in the compilation products that could be used by an IDE to lead outside the codebase. I suggest that is an entirely different dialog where the Raku compiler is being asked to, in the most general case, bundle in user-computed arbitrary metadata developed at compile time for consumption later by arbitrary debuggers/IDEs. Not only do you need a way to insert computed metadata in to the debugging data (the Raku side as a dumb bundler-messenger) but you also need debuggers/IDEs that can use it for the IDE feature you want (the IDE developers' side).

Regards.

Feb 28 '20 22:02 BloomingAzaleas

@BloomingAzaleas rakudo's precompilation repository does not use timestamps at all. Instead it uses SHA1 checksums of file contents. This is good because it would actually cover the named pipe use case, but it also means that a simple touch would not suffice to trigger a re-compile.

I've come across a use-case where we have a dependency on a file but it's not a bundled resource and it's actually a quite obvious one: use Foo:from<Perl5>;. Since this will import symbols and create wrapper classes, changing the Perl 5 module really should trigger a recompile of the using Raku module. Luckily CompUnit::PrecompilationDependency::File can be used for this as-is, but we can't tie the mechanism to resources (though of course resources may get some sugar as they will be the most common case).

As for code navigation, the Raku way would probably be a trait is generated-from that can be applied to any kind of symbol. While a source can be just about anything (think URL, data base, environment), the most common will probably be some file or rather some location within a file. The trait_mod could record the dependency information during precompilation.

Jun 11 '20 11:06 niner

Ah. I have wall-clock performance burden concerns about the checksum idea but since I do not know what criteria trigger a pre-comp dependency check (every Raku app launch? only by option flag?) and how recursive it might be I will set that aside in favor of a more general proposal addressing my key concern of adding edge case complexity to the Raku compiler.

Re-conceive the pre-comp functionality as a handler framework provided by the Raku core, including a toolset library. The framework takes handler plugins. The framework must export interfaces to the plugins and vice versa. Standard stuff. The Raku distro ships with a framework plugin to handle the simplest case of a 99.999% of all Raku code in the world: a wholly static, flat-text-filesystem-object source code dependency tree, and possibly another for the “use Foo:from<Perl5>” use case given presumptive expected high usage of the vast CPAN archive. Raku app developers in the field then provide plugins for anything else. Mechanism to substitute for the distro standard plugins. This concept extracts the Raku core team from providing functionality for pre-comp dynamic code edge cases and assigns it where it belongs – to those Raku app developers who know their edge cases best.

Very broadly, the conops for a pre-comp check pass would be:

A) Develop a list of pre-comp candidates (whatever this means). Somehow this is done today using info in the pre-comp repository and other info developing candidates with no existing pre-comp object. Possibly plugins export an optional call-in interface for adding candidates to the list and all such plugins are called by the framework.

B) For each candidate, in some way identify a handler plugin. This could include an “is this yours?” query in to all plugins offering optionally offering said query interface. Note that at this point authoritative source for a candidate may not exist as a flat-text-filesystem-object or perhaps a flat-file placeholder stub exists containing sufficient info to identify a pre-comp handler.

To exaggerate for illustration, consider the above MagicSqlThingy jnthn 7 Feb example. Suppose, for configuration management purposes, organization practice required the authoritative version of the snippets in sql/foo.sql to be kept as text fields along with metadata as records in a database or some other not-final-code-flat-text-file storage object that requires use of an access method API. Even flat text files in a source code control system are not “final” – today’s “final” needs to be re-created via an API. In this scenario, at best the sql/foo.sql flat text file might be allowed as a build process temporary but is not otherwise authoritative, and is to be rebuilt itself every pre-comp pass from a not-a-flat-text-file authoritative source if the developer supplied plugin probing the authoritative source via its API determines the flat file is stale.

C) Call in to each plugin with an arg list containing all candidates for which the plugin is the identified handler.

C1) Handler responsibilities:

C1a) LOOP: Per candidate object in the arg list,

Locate authoritative source code fragment(s) possibly in multiple storage objects. For   MagicSqlThingy this could include a data base query and records retrieval. Code for this provided by the app developer’s pre-comp plugin.

C1b) Compute an opaque (to Raku) code change quantitization token (CCQT) over the authoritative source fragment(s) as if it was a single, flat text file ready for compiler ingest. Note that a CCQT could pre-computed by whatever functionality manages the external authoritative source repo. Toolset to provide tools for this. This CCQT abstraction means that any computable datum that changes when the authoritative source changes can serve, be it timestamps or checksums (or combination to get “touch” functionality!) or whatever. If it is kept as an opaque token in the pre-comp repo, then it could be specialized by a plugin for pre-comp time performance or accuracy or other tradeoffs.

C1c) If an existing pre-comp object exists, retrieve its CCQT and compare to authoritative CCQT.

If same, update pre-comp object in repo with a “last checked” timestamp by "plugin ID version x.x" for reporting purposes; 
LOOP

C1d) If pre-comp repo object CCQT not same or repo object not exist, use toolset functions to create a pre-comp object for pre-comp repo insertion, validate the new object CCQT against expectation, request repo insertion or replacement of pre-comp object LOOP

The toolset and plugins then become public contribution targets as well.

TIMTOWTDI of course.

Regards.

Jun 12 '20 15:06 BloomingAzaleas

problem-solving problem-solving copied to clipboard

The development experience around the generate-and-export pattern has shortcomings

Background

Dependency issues

No way to identify the source

Constraints

problem-solving
problem-solving copied to clipboard