nickel icon indicating copy to clipboard operation
nickel copied to clipboard

Lazy imports

Open thufschmitt opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe.

Imports are currently strict in Nickel. This means that the sheer size of the codebase will matter regardless of what we evaluate in it (going against the general laziness principle of the language). Besides, the absence of dynamic imports means that every potentially needed file has to be systematically imported.

Describe the solution you'd like

Somehow make the imports lazier. IIRC, they are strict because typechecking is, but maybe there's a way around this? (making the imports strict only in a strictly typed context?)

Describe alternatives you've considered

Having dynamic imports would somehow reduce the need for that (since it allows encoding a lazy import function), but suffers from the same issues I guess.

Additional context

Came in the context of https://github.com/thufschmitt/nickel-schemastore : I wanted to do a brutal export of https://schemastore.org with https://github.com/nickel-lang/json-schema-to-nickel, but the generated schemas are rather big, and just running nickel eval <<<'let foo = import "main.ncl" in 123' from that repo takes several minutes. Since each schema is in its own file, having lazy imports would make this drastically faster.

EDIT: Repro for the benchmark:

git clone https://github.com/thufschmitt/nickel-schemastore
cd nickel-schemastore
# Takes around 2mins on my machine
nix shell github:tweag/nickel/fe66a02bded97bb804521a79b2272ad1591f807c --command time -f %E nickel eval <<<'let foo = import "main.ncl" in 123'

The total lines count of the generated schemas is ~1.7M loc.

thufschmitt avatar Oct 20 '23 11:10 thufschmitt

Came in the context of https://github.com/thufschmitt/nickel-schemastore : I wanted to do a brutal export of https://schemastore.org/ with https://github.com/nickel-lang/json-schema-to-nickel, but the generated schemas are rather big, and just running nickel eval <<<'let foo = import "main.ncl" in 123' from that repo takes several minutes. Since each schema is in its own file, having lazy imports would make this drastically faster.

Would you have a repro to share? I think taking several minutes, even for huge files, is a red flag. Even if we don't have the ambition to be as efficient as state of the art SIMD-enabled JSON parser that can eat gigabytes each second, parsing and typechecking files without statically typed block should ideally be pretty fast, even on very big files.

For dynamic imports, I think there's a tension between the lazyness of the language evaluation model - which says something is ever evaluated only if it's needed - including contracts. On the other hand, the promise of static typing is that errors are caught ahead of time, even before something is evaluated, and even if it's not needed. It's true within a single file, whether we have dynamic imports or not. Fulfilling this promise forces us to do import ahead of time, because even if you don't import a file from a statically typed context, there might be statically typed blocks within this file. Thus, you could let code that doesn't parse and doesn't typecheck be imported as long as you don't use it.

That being said, the problem is philosophical: right now, typechecking is purposefully kept modular, so we could very much decide to import stuff lazily and perform typechecking on the fly, the first time we see a file. We don't need previous typing information (except when an import is used in a statically typed block, in which case we need to parse the file at least right now, but even that could be solved by having a notion of interface à la OCaml).

So, I imagine the big question is: what makes the most sense as the default behavior? Is the use case of having a lot of huge files that are slow to import common enough so that paying the price of delaying some error reporting makes sense? Or is it not that common, and we should keep import eager by default, but still provide an escape hatch import lazy "foo.ncl"? Or is it just a performance issue with the parser and the typechecker, which is the actual root of the problem? (I don't have the answer right now, and I'm not claiming that the current design is the right one, but I just wanted to give some perspective)

yannham avatar Oct 20 '23 14:10 yannham

Thanks for the detailed answer :)

Would you have a repro to share?

Yes, I've just updated the post to add it.

So, I imagine the big question is: what makes the most sense as the default behavior? Is the use case of having a lot of huge files that are slow to import common enough so that paying the price of delaying some error reporting makes sense? Or is it not that common, and we should keep import eager by default, but still provide an escape hatch import lazy "foo.ncl"? Or is it just a performance issue with the parser and the typechecker, which is the actual root of the problem? (I don't have the answer right now, and I'm not claiming that the current design is the right one, but I just wanted to give some perspective)

My intuition is that if we ever want a Nixpkgs-like thing in Nickel, then no amount of optimizing will be enough. I don't know how relevant it is as a benchmark, but just trying to read every file in Nixpkgs with find . -type f -exec cat \{\} \; > /dev/null takes slightly less than 3 mins on my machine. Doesn't mean that an opt-in lazy import wouldn't be a good fit though. Also, with https://github.com/NixOS/rfcs/blob/master/rfcs/0140-simple-package-paths.md being implemented, Nixpkgs is slowly starting to rely more and more on dynamic imports (very roughly doing something like mapAttrs import (builtins.readDir ./pkgs/by-name)). Maybe there could be a pattern where you want lazy imports at the same place where you want dynamic imports.

thufschmitt avatar Oct 23 '23 08:10 thufschmitt

My intuition is that if we ever want a Nixpkgs-like thing in Nickel, then no amount of optimizing will be enough. I don't know how relevant it is as a benchmark, but just trying to read every file in Nixpkgs with find . -type f -exec cat {} ; > /dev/null takes slightly less than 3 mins on my machine.

Ah, this is a good datapoint.

This topic has been discussed in the weekly meeting. It seems that what people would ideally want is not having users (or lib contributors) have to think about putting lazy imports everywhere (or the converse, if lazy imports were made the default and you want the eager version).

We also discussed separately the current lack of configuration for the Nickel interpreter. One idea that came up is to have a nickel_rc.ncl file (or whatever you call it) to set various options. That file could be project-specific (maybe not all options qualifies here - there are still some things to clarify). But the lazyness of import statements could be configured there, per-project. If I import something from nickelpkgs, the latter would set this option, making all import from there lazy. After all, with a proper CI, library are also way less likely to have static type errors, and the performance implication also depend on the nature and the architecture of the library, so it would make sense that libraries can control those parameters themselves.

yannham avatar Nov 27 '23 10:11 yannham