alma icon indicating copy to clipboard operation
alma copied to clipboard

Implement module imports

Open masak opened this issue 8 years ago • 26 comments

Why modules and module imports? Because eventually we'll want to play around with macros that affect the parsing context that imported them.

I have no reasonable basis for choosing either (Perl) use or (Python) import, so let's go with use because it's short. (But I will keep referring to them as "imports".)

There are two sorts of import. The form that takes an identifier loads something from 007 itself.

use Runtime;      # and 007 knows what this is and knows how to provide it

The form that takes a string literal loads something from a path relative to to loading script or module.

use "./Foo";      # loads Foo.007 in the script's directory
use "Bar/Baz";    # loads Baz.007 in the Bar/ subdirectory in the script's directory

In either case, a symbol gets installed in the lexical scope corresponding to the loaded module. (It's a compile-time error for the file name sans .007 extension to not be a valid identifier.)

Importing the module causes its 007 file to run. (Though in the case of internal 007 modules, this may be faked.) The symbol that gets installed is an object, but let's give it the object type Module. Its properties correspond to the variables defined in the topmost scope at the end of running the module.

A use counts as a variable declaration. Therefore it's a compile-time error to refer to an import before importing it, or to refer to an outer variable x and then importing x on top of it. Aside from this, use statements can occur anywhere. The import logic happens at BEGIN time.

It's fine for a module to import other modules. Paths keep being relative to the thing that does the importing. At any given time, we're pushed a number of compunits on a conceptual stack, waiting for the thing they imported to finish loading. It's an immediate compile-time error to try to import something that's already on that stack.

In the fullness of time, a module being loaded is meant to be able to influence its loading context more than just installing a single symbol into it. The exact mechanism for this I leave unspecified for now — but it could be something as simple as there always being a loader object available in a loaded module. Similarly, the parser of the loaded module could perhaps be accessed through a parser object.

masak avatar Oct 22 '15 14:10 masak

Just a quick thought: at the point when we have module imports, and the Foo { ... } instantiation syntax, it will also make sense to do mymodule.Foo { ... }. (Later note: now new Foo { ... } and new mymodule.Foo { ... }. Everything else still applies.)

I just realized that whereas an identifier Foo can be post-declared in that case (and we just need to verify that when it's declared, it actually is a class), mymodule.Foo can't.

  • mymodule has to be already loaded
  • Foo has to be known to be a class

There's no leeway here. Why? Becuase we've already left that compilation unit, and we won't learn anything more about it.

As far as I can see, Foo and mymodule.Foo — that is, simple identifiers and identifiers separated by lookup dot (.) — are the only two forms that make sense for identifying a class to create a new object. Anything else should be a parse error.

masak avatar Oct 28 '15 14:10 masak

In the fullness of time, a module being loaded is meant to be able to influence its loading context more than just installing a single symbol into it. The exact mechanism for this I leave unspecified for now — but it could be something as simple as there always being a loader object available in a loaded module. Similarly, the parser of the loaded module could perhaps be accessed through a parser object.

I don't think I realized it at the time, but... if we have the concept of "internal modules" anyway, the cleanest way by far to provide loader and parser is through that same mechanism:

use loader;
use parser;

Though perhaps the names should be re-thought to be more suggestive:

use loadingCompUnit;
# ...
loadingCompUnit.parser;

Or something.

masak avatar Jun 27 '16 12:06 masak

In either case, a symbol gets installed in the lexical scope corresponding to the loaded module.

This won't be enough.

Take the use case of loading ranges as a module.

use range;

The range.007 module looks something like this:

class Range {
    has min;
    has max;
    has excludes_min = 0;
    has excludes_max = 0;
    # more stuff omitted
}

sub infix:<..>(min, max) is looser(infix:<+>) {
    return new Range { min, max };
}
# ditto infix:<^..>, infix:<..^>, infix:<^..^>

Perhaps we can all survive having to write range.Range all day, but even there it'd be kinda nice to have the class be auto-imported into the importing scope.

But with the operators, it's a non-starter to have them not import, because not importing them means not installing them as operators in the current parser. And then what good are they?

(Same deal with macros we'd want to export, of all three kinds.)

I can see the problem here quite clearly, but I don't yet have a good solution to it. There's plenty of prior art, and maybe we can lean on some of that. For example, Python (and Haskell) for this from ... import ... thing:

from foo.bar import baz    # foo.bar.baz imported and bound as baz

It's nice in that it's very explicit — the importing compunit itself contains enough information for its parser to know which symbols will be imported — but I have a feeling that'd also be annoying. In particular, I want all those four range infix operators, always, I don't want to have to go and update the import list when I use a new one. Ditto the Range type itself, probably.

Since there's nothing that gets imported via the range identifier itself, I'm not even sure I want one to be created. This seems to be true for all "sufficiently language-modding" modules, whose purpose it is to extend the language, the parser, the available operators (in the importing scope), etc.

Maybe we should do what lizmat++ suggested once, and have two separate keywords? One use that loads everything into a variable with absolutely no effects outside of that, and one abide (or whatever) that allows the current compunit's parser to have things injected into it.

(Hm, pragma? Maybe abide is too cute-today...)

I can certainly argue this one both ways. On the one hand, it's cool that pragma would let us see immediately that we're doing something invisible with the local scope, so please check that out, kthx. (While avoiding having to list all the things we do, instead leaving it up to the imported pragma.) On the other hand, did we just split our module importing mechanism in two? Why? Was there really a need for that?

On the module end of things, I was toying with the idea of an export { ... } block. I like this better than having to mark stuff up as is export like in Perl 6. An export { ... } block collects everything in one place, and inside the block you'd use exactly the same syntax as when you normally declare constants, subs, macros, operators...

So if we don't go with the use/pragma split, we can simply decide that if a module has that export { ... } block, it's a pragma and rather than installing things under the range variable in the importing scope, it will install a lot of other things directly into the importing scope.

Still pondering all the forces involved.

masak avatar Aug 08 '16 18:08 masak

I added the "needs-more-design" label to this issue. In my view, the next step is to identify the main use cases for modules (such as the Range case above) and to figure out straightforward ways to support them with syntax and semantics.

masak avatar Jan 03 '17 14:01 masak

For example, Python (and Haskell) for this from ... import ... thing

Coming back to this issue, I think this is where we should start. Possibly we can have a from ... import * form which saves the trouble of specifying everything that needs importing.

I think we should avoid having an explicit exporting mechanism, at least not having one proves untenable. Instead, a module implicitly exports everything in its compunit top level. (If we ever decide to go with an explicit exporting mechanism, an @export trait should do the trick.)

We need to remember to handle the case of colliding names through imports. Though strictly, this is just a special case of colliding declarations. The * form would make it not evident from the importing/"caller" code itself that there's a conflict; instead, we'd need to parse and collect identifiers on the importee/"callee" side. But we need to do that anyway, according to the semantics of the importing mechanism.

A module should consider its top-level names to be its "published API", and a change to that might affect its importing modules adversely. This problem is what versioning would normally solve; I don't see that we need to do that, because 007 is not meant for actual real-world production use, just to explore things around macros.

masak avatar Mar 23 '18 10:03 masak

The symbol that gets installed is an object, but let's give it the object type Module. Its properties correspond to the variables defined in the topmost scope at the end of running the module.

There's a contradiction here that I can't quite pinpoint. I think these three wishes are mutually incompatible in a 2-out-of-3 sense:

  1. The object resulting from an import is of type Module, that is, type(imported_thing) == Module.
  2. The type object determines the layout of the instance object: what properties are in there, and in what order.
  3. Different modules have (duh) different properties.

Though it strikes me as I write this that fixing this contradiction might be as simple as loosening up the first requirement to imported_thing ~~ Module, allowing subtypes of the Module type. Each module import would have its own Module subtype, individually specifying the property layout.

Come to think of it, I'm guessing a similar thing would need to happen with enums.

masak avatar Mar 29 '18 08:03 masak

Aye; coming back to this issue, it now seems clear that each imported module needs to have as a type its own anonymous Module subtype.

The pattern here is a little bit similar to anonymous subclasses in Java.

masak avatar Apr 18 '18 08:04 masak

I think I've arrived at a good design that covers all the bases.

First of, we're going to make the keyword import, not use, for a number of small reasons. First off, Perl is in a minority to prefer use. It's not quite as opaque as grep for filter, but it does lack some of the self-explanatory power of import. (The latter feels more "visceral" to me.) Lastly, we can use both Java and JavaScript as a tiebreaker in favor of import.

Actual syntax/design in the next comment.

masak avatar May 27 '18 08:05 masak

Ok, so. First off, in this design, exports are explicit. I would do it through an is export trait, which will later evolve to an @export annotation. If people want to export everything, we can provide an exports block as a language extension.

There are three import statement forms:

  1. import * from "m";. Imports all of m's exports into the current lexical scope. Therefore it's in effect N declarations of those exports in this scope. This one addresses the range use case above.

  2. import { foo, bar } from "m"; Selectively imports some of m's exports into the current lexical scope. There's also a foo as mFoo variant that imports the foo under a different name in order to avoid name clashes.

  3. import m from "m"; Imports the module object m into the current lexical scope. Its properties are m's exports. Can be abbreviated import m; in which case the path string is derived from the module specifier.

masak avatar May 27 '18 08:05 masak

Immediate afterthought: since import m; is a short form of the third form import m from "m";, I think it makes sense to always allow the final string (occurring in all unabbreviated forms) to be an x.y.z "identifier path". (Term coined by me, right now. IdentiPath™, recommended by four out of five dentists.)

That is, import m; can also be seen as being short for import m from m;. And the first two forms can also be written as import * from m; and import { foo, bar } from m;. Actually, that feels oddly convenient right away, and mirrors the way Java, Python, and Perl — basically everyone except JavaScript — do it.

Just to be clear, the string form will still be allowed, and might be useful in some rare cases when the module is not a valid identifier. I suggest the idiomatic way to write it be with the identipath, but we drop down to the string when necessary.

Notice that the identipath exists in another "namespace", so for example this

my example = 42;
import x from example;

resolves the name example among imports as usual, with absolutely no conflict or confusion from the my example. The example identifier on the second line is not found in the lexical scope. On the other hand, if the second line had been import example; or import example from anything_really;, then there would have been a naming clash because two things in the same scope tried to declare the same name example.

masak avatar May 27 '18 11:05 masak

I initially meant to include a fourth form in the design:

import from "m";
import from m;

Note that this form is the only one that contains nothing between the import and the from. The semantics of this form would be to import a module only for its global side effects, not for its inclusion as a name in the scope.

At the last moment as I was writing down the other forms, I changed my mind. For three reasons:

  • The fourth form would be very similar to the first and third forms, and I suspect people would have a hard time keeping them apart. (JavaScript has too many forms due in the end to merging together two separate module conventions, and I've seen firsthand that it's problematic for people to keep all the forms apart.) I have a feeling the fourth form would make it hard even for experienced developers to take in at a glance what the import statement does. (Especially given the third form's abbreviation, which I like and am not willing to sacrifice.)

  • You can easily emulate the fourth form using the third form. Yes, you'd get an extra identifier in the scope representing the fact that you made an import, but that seems rather harmless to me.

  • Though I can think of use cases for causing global effects when imported, I can't think of any good use cases that I'd want to encourage. So this is also somewhat pushing people towards the right patterns, I guess.

Much, much later edit: And of course, you could always write it using the second form:

import {} from m;

Which, to be honest, also has clarity going for it: it explicitly say "do the import of the module, but bind no names" — from which the obvious conclusion needs to be that we're importing for some other reason than names.

masak avatar May 27 '18 11:05 masak

Oh, and for a brief while before I rejected the fourth form, I made up the rule that forms 1..3 would require there to be at least one export from the module, whereas form 4 would require there to be none. But then the above arguments won out, and I dropped the fourth form.

masak avatar May 27 '18 11:05 masak

One more thought for now: let's call the thing I realized in this comment "early module binding".

Early module binding is necessary with the first form of import: we need to know right away what names we're importing into the current lexical scope, so that subsequent code can know which names are taken already.

However, with the third form, we're only really declaring a module m; its contents needs to be available at runtime, of course, but there's nothing technically impossible with waiting until then. Let's call that option "late module binding".

With a late-bound module we might write m.foo, and the check for whether .foo actually exists in m might have to be deferred until we've actually loaded m — still before runtime, but not right away either.

When might this happen? When we have mutual imports:

# foo.007
import bar;

# bar.007
import foo;

I said in the OP that this should be impossible by design (and I still think it should be). But I'd just like to point out that it seems like it could be made to work, and this would in theory be a way for modules to cyclically depend on each other.

The whole thing reminds me of this p6l thread (Edit: link updated/rescued out of link rot), which feels like it's from a completely different geological era.

Here's TimToady's take on it in that thread:

But I also think that type recursion is likelier to indicate a design error than function recursion, so I'm not sure how far down this road we want to go.

Yep. I think type recursion should be possible in some rare cases, but nowadays I would recommend breaking things apart using a pattern like this:

  1. Types A and B need to depend on each other somehow.
  2. Figure out which one is the more important type. Let's say it's A.
  3. Take B and make it implement an interface.
  4. Let both A and B import that new interface. B declares it and A consumes it.

Boom, circularity gone.

I guess I think it's important enough for modules to know what's "up" and what's "down" to not want to mess with import circularity.

masak avatar May 27 '18 12:05 masak

Leaving circularity aside, it's perfectly admissible for the same modules to be imported several times during the same compilation process, as we traverse the compunits and the import relations between them.

I guess this is the same as saying that the import tree is actually a DAG. But it's actually some kind of multi-DAG, since the same compunit can even import the same module multiple times, in different lexical scopes. (Note to self: write a test for that.)

In the end, import actually does three things:

  1. Find the module
  2. Compile the module
  3. Import the specified identifiers into the current lexical scope

Step 2 only needs to be done once per module and compilation process. If we have some kind of __pycache__-like place with already-compiled bytecode files, we can even skip step 2 entirely. Without going into details, we can use judicious caching to avoid having to do double work when a module gets imported many times in a compilation process.

masak avatar May 27 '18 12:05 masak

Does it make sense to compile but not run a module that's imported? I'm asking because I just realized that something like

if False {
    @export
    sub foo() {
    }
}

would be problematic, because the if False block wouldn't ever run, and so would only have a function value from its static block. In general, we can only guarantee that things we export have a static version.

So, yes, "export the static things" seems a way forward. And maybe even "compile the imported module, but do not run it".

How do the other languages do this? Quick investigation:

  • Perl 5: definitely runs during a use. Famously, needs to return something truthy in the end to signal importing success.
  • Perl 6: runs during a use.
  • Python: runs during an import. That's because Python in the end does not really have a "static" phase.
  • ES6: runs during an import.

I also tried putting an export inside an if statement in ES6, and got "'import' and 'export' may only appear at the top level". Perforce this holds in Python too, since everything at the top level scope gets exported.

So I think the way to do it is this:

  • Run modules when we import them. (Note to self: would still be useful to have a construct like Python's if __name__ == "__main__" to discover if we're being run directly or imported as a module.)
  • Only allow exports on the "top level", indentation level 0, of the module file.
  • (Imports are still allowed anywhere, even in deeper scopes. Imports are always static, and happen at compile-time, regardless of whether the scope will later be entered or not.)
  • The things that are eventually exported (from the top level) close over the run-time frame that we got from running the module.
  • Because a module is guaranteed to be parsed/run only once even when it's imported multiple times from different points in the code base, a module and all its exports are effectively singletons in the code base.

masak avatar May 28 '18 09:05 masak

Pawel Murias points out that we can arrange things so that all modules are compiled first, and only later do we run all the compunits. This is a good point, and I think it aligns better with users' expectations.

masak avatar May 28 '18 11:05 masak

One thing I realized is that all the forms of import, even the first form, will need to clearly specify which names they declare in the lexical scope. In the second and third forms, this is "local" data which we can gain just by parsing out the statement itself.

In the case of the first form, the answer is over in the loaded module, and so we need to parse it first before we finish building the Qnode for that import statement.

Also, in an IDE setting, the question "where was this declared?" on an imported name can mean two different things — either the import statement, or the actual declaration (the one with an @export annotation) in the loaded module.

masak avatar May 28 '18 11:05 masak

Also, in an IDE setting, the question "where was this declared?" on an imported name can mean two different things — either the import statement, or the actual declaration (the one with an @export annotation) in the loaded module.

I think a "where was this declared" depends: in the import declaration, it must jump to the source file. Otherwise, it must just to the import declaration. (that's how it works on webstorm with es6 modules at least, and that's the behavior I know I expected).

vendethiel avatar May 28 '18 11:05 vendethiel

@vendethiel Yes, something like that is what I'd expect too.

Note that the first form (import * from m;) precludes jumping to an actual declaration in the loaded module. Guess we could do either of these things:

  • Jump from the name to the import, then lose state and just jump to the top of the imported module.
  • Give two choices when jumping from the name: either to the import, or directly to the declaration in the loaded module.

masak avatar May 28 '18 18:05 masak

Exports should be immutable bindings, like in ES6. That post is great, I need to read it again. If I'm reading it right, the fact that exports are immutable bindings will even make cyclic imports feasible. (I'm fine with that, but I prefer to be conservative in the short run and think about allowing cycles when we're comfortable with the base functionality.)

masak avatar May 29 '18 11:05 masak

One thing (it seems to me) we will lose if we allow cyclic bindings is the ability to tell at parse-time what type object is in a name like x.y.z.MyType. (And that's exactly what TimToady said in that p6l email all those years ago!) Given that 007 likes to know things about types at parse-time, maybe that's a show-stopper for cyclic dependencies.

masak avatar May 29 '18 11:05 masak

I was reading a TC39 proposal today, and it seems to have changed my mind about whether export should be a statement.

In the above discussion, I have @export as an annotation. My rationale for this is that (I feel) many statement-starting keywords are secretly annotations, accidentally promoted above their station. (Examples from Java: public, abstract, final, static, transient.) Maybe things would've been different in Java if annotations had been in the language from the start. @export felt like a clear example of this: the only way it changes a declaration-y statement is by making it available in a module's exports list.

But then I read that "export default from" proposal, and I remembered/realized that export has another function in JS, and one that I think is worth borrowing into 007: the export from forms.

From what I can see, we'd have five forms of export statement:

  1. export <declaration-statement> where (currently) the declaration statements are my, constant, sub func, macro, class.

  2. export { x, y, z }; where x, y, z are names in scope.

The remaining three mirror what import does:

  1. export * from m;
  2. export { foo, bar } from m;
  3. export m from m;

Whereas the three import forms import things from m's exports list, the three corresponding export forms import things from m's exports list into this module's exports list, without needing to introduce a lexical name as an intermediary.

I think export m; might work as a short version of export m from m; just as import m; is short for import m from m;. I'm a wee bit worried people will see it and think it means export { m };. Guess only actual use will tell.

masak avatar Jun 01 '18 20:06 masak

Both export form 2 and export form 4 support the same type of foo as bar renaming syntax as does import form 2.

masak avatar Jun 01 '18 21:06 masak

Just throwing this blog post about Python imports into this issue, as it seems to have relevant information about a lot of things that we might want to consider when implementing 007 modules/imports.

masak avatar Aug 22 '18 08:08 masak

Just doing some drive-by commenting here, pointing out that macro hygiene will have a cross-cutting impact on modules and imports. Namely, if module A calls a macro in module B, which generates code containing a name that B imported from module C, then (invoking hygiene) module A just gained a "hidden import" of module C. (Conceptually, a "macro-expanded AST" for module A might contain an explicit import; the point is that it's "hidden", as in, it doesn't need to be explicitly imported in A.)

Since Alma is a language primarily with macros and only secondarily with modules, this aspect of modules is pretty front-and-center.

The paper Extending the Scope of Syntactic Abstraction addresses this quite foundationally, by defining modules on top of Scheme macros. My fingers are itching a little bit to try defining the things in that paper in a small prototype Scheme. (Edit: This slide deck by Flatt references the paper, calling the technique "splicing scope".)

masak avatar Jan 30 '23 07:01 masak

Pawel Murias points out that we can arrange things so that all modules are compiled first, and only later do we run all the compunits. This is a good point, and I think it aligns better with users' expectations.

This Mozilla blog post contains a good run-down of this process in the case of ES modules, explaining how there are three phases:

  1. Construction — find, download, and parse all of the files into module records.
  2. Instantiation —find boxes in memory to place all of the exported values in (but don’t fill them in with values yet). Then make both exports and imports point to those boxes in memory. This is called linking.
  3. Evaluation —run the code to fill in the boxes with the variables’ actual values.

It also explains the "live bindings" thing, and how the instantiation phase wires things up without evaluating them, which allows for reference/dependency cycles.

masak avatar May 24 '23 03:05 masak