links icon indicating copy to clipboard operation
links copied to clipboard

Illeity

Open dhil opened this issue 6 years ago • 11 comments

I have been working towards an implementation for #571. While trying to understand what the behaviour of the current chaser/loader is, and in particular how it deals with some edge cases, I discovered this gem.

It is possible for a top-level module to clone itself (as in code being emitted multiple times), consider the following program

# foo.links
open Foo;

fun main() {
  println("Hello World")
}

Foo.main()

The file foo.links requests the module Foo to be loaded, and invokes the main function from that module. Exactly what the symbolic name Foo refers to depends on the runtime namespace (or path, which is the current term of arts in Links). By running ./links -m foo.links I would expect to see an unbound module error, since I haven't provided a path. However, current behaviour suggests that in the absence of a path . is used, so at the very least I would expect a cyclic dependency error. Neither happens, instead Links generates code for foo.links twice:

# omitted the IR for prelude.links for brevity
([(Ir.Fun
     ((318, (() ~> (), "Foo.main", Var.Scope.Global)),
      ([a::Row], [],
       ([],
        (Ir.Apply ((Ir.TApp ((Ir.Variable 240), [{ |_ }])),
           [(Ir.Constant (CommonTypes.Constant.String "Hello World"))])))),
      None, CommonTypes.Location.Unknown));
   (Ir.Fun
      ((319, (() ~> (), "main", Var.Scope.Global)),
       ([a::Row], [],
        ([],
         (Ir.Apply ((Ir.TApp ((Ir.Variable 240), [{ |_ }])),
            [(Ir.Constant (CommonTypes.Constant.String "Hello World"))])))),
       None, CommonTypes.Location.Unknown))
   ],
 (Ir.Apply ((Ir.TApp ((Ir.Variable 318), [{  }])), [])))

As you can tell, the generated code contains two copies of the bindings of foo.links. Curiously, the tail computation Foo.main() is only emitted once. I suspect this is because the chaser/loader inlines bindings from dependencies, which is incorrect.

Consider the scenario, where we have foo.links and examples/foo.links, where the latter may be

# examples/foo.links
open Foo;

fun main() {
  println("Hello Examples")
}

Foo.main()

Then running ./links -m --path=examples examples/foo.links generates the following code

([(Ir.Fun
     ((318, (() ~> (), "Foo.main", Var.Scope.Global)),
      ([a::Row], [],
       ([],
        (Ir.Apply ((Ir.TApp ((Ir.Variable 240), [{ |_ }])),
           [(Ir.Constant (CommonTypes.Constant.String "Hello World"))])))),
      None, CommonTypes.Location.Unknown));
   (Ir.Fun
      ((319, (() ~> (), "main", Var.Scope.Global)),
       ([a::Row], [],
        ([],
         (Ir.Apply ((Ir.TApp ((Ir.Variable 240), [{ |_ }])),
            [(Ir.Constant (CommonTypes.Constant.String "Hello Examples"))])))),
       None, CommonTypes.Location.Unknown))
   ],
 (Ir.Apply ((Ir.TApp ((Ir.Variable 318), [{  }])), [])))

That is, Links has resolved Foo to mean foo.links rather than examples/foo.links, which I found to be surprising. If you delete foo.links and rerun the above command, then we get another instance of the first problem.

Having import in addition to open as suggested in #571 does not solve this issue on its own.

import Examples.Foo;
fun main() {
  println("Hello World")
}
main()

Then what is the intended behaviour of ./links --path=examples examples/foo.links? I suppose Examples.Foo should resolve to examples/foo.links, but how can I reliably tell that this is in fact the same file as I am currently processing? It is not reliable to deduce from the input filename alone that the two are the same as I could have given the equivalent name $(pwd)/examples/foo.links or ../links/examples/foo.links or millions of other possible filenames. In essence, this seems to be an aliasing issue. And I have not even mentioned symbolic links...

I suppose one solution might be to canonicalise filenames, both the ones given as input, and the ones that have been resolved, but I am not sure whether that is reliable either. And it seems like a hack to me. If this problem is analogous to pointer aliasing, then I think we should revisit our design decisions.

I see this is as problem that could easily manifest in our daily work (maybe it already does), as many of us have configs with parameters like

modules=true
path="examples:examples/webserver:examples/games:examples/directory"

dhil avatar May 05 '19 11:05 dhil

Ha. Yes, this cyclic import seems like it should be an error.

I think that if we move to having explicit import, with dot-notation aligned with directory structure, then we also probably want to move to a model where the path specifies only root path(s) of source trees from which importing is to happen. That is we would just have path=examples in the config file and to refer to examples/webserver/foo.links we would replace open Foo with open import Webserver.Foo. Right now we actually have no way to import modules from other directories short of adding the other directories to the path, but that means there would be no way of referring to both webserve/foo.links and games/foo.links. So even without cyclic imports the current approach seems slightly broken anyway.

OTOH I'm not a super-huge fan of identifying module/namespace dot with subdirectory structure, as the above makes it seem like Webserver is a module with component Foo, whereas importing Webserver on its own is meaningless. It also means that there is a lot of random noise needed if the directory structure of a project changes. I think this is one of the things SML-NJ's old compilation manager (CM) did well - i.e. you define a project using a small DSL that relates source language files and groups/components.

This is exactly the kind of thing that makes me wish we had a clean design to start from, though - I think what I'm suggesting above is basically just "do what Haskell/Java does" which is not totally satisfying.

For now, supporting only import Foo where Foo refers to the first foo.links found in the path (i.e. similar to what open currently does) seems like a conservative workaround, we then just need to check for import cycles among actually-imported files and could consider proposals for richer import behavior later.

jamescheney avatar May 05 '19 12:05 jamescheney

This is exactly the kind of thing that makes me wish we had a clean design to start from, though - I think what I'm suggesting above is basically just "do what Haskell/Java does" which is not totally satisfying.

I have been against automatic resolution of dependencies from the beginning. I guess having to work with your nemesis means you to get to know it well. I am now convinced that it is a bad design, as it is inherently anti-modular and it requires global knowledge. I suppose it is at odds with separate compilation too.

Automatic resolution of dependencies seems to have many of the same problems as implicits (as a programming language feature). It simply goes against my mantra that "programmers ought to reason locally, not globally". It seems to me that automatic dependency resolution is not the job of the compiler / programming language, but rather the programmer's job, possibly with help from a tool.

dhil avatar May 05 '19 17:05 dhil

I think I have come up with a reliable approach to detect a cyclic load, when the import program lives in one of the paths in --path. At least it detects the cases outlined above.

Unfortunately, I had to disable loading via symlinks for now, as I seem to remember there were problems with using realpath or readlink on MacOS, hence I don't have a reliable way to resolve those. We can revisit this later.

I am hopeful that I can finish the patch tomorrow.

dhil avatar May 06 '19 23:05 dhil

Cool. Yes, realpath and readlink are not portable across Linux vs. BSD (MacOSX), or at least, the command-line tools that provide access to these are not portable. If you can call them from within OCaml then we should check whether these versions are portable.

While we are on the subject, the following related questions occurred to me:

  • is import (or open) allowed anywhere (e.g. inside a module, inside a function) or just at the toplevel?
  • if yes, does the effect of the import or open follow lexical scope or what? (for open, it seems straightforward to do the right thing, but how do we undo an import?)

For comparison, #use in OCaml/F# is only allowed at the toplevel and in interpreted programs (scripts) rather than compiled files. Thus, when you switch from interactively developing to building a compiled program you need to redo this. This difference has always really bugged me. On the other hand Scala allows import just about anywhere.

jamescheney avatar May 07 '19 16:05 jamescheney

My intention is to only allow import at the toplevel. As for the semantics, I intend for import to bring a synthetic module into scope, whose name is that of the compilation unit, e.g.

# foo.links
var x = "Foo";

# bar.links
import Foo;
println(Foo.x)

The synthetic module may shadow a user-defined module, if the import is placed after its definition. As discussed previously, the the synthetic module/imported compilation unit does not automatically get opened.

dhil avatar May 07 '19 16:05 dhil

Patch #599 introduces the syntax for import, but not its semantics. I got another branch for the semantics, it is a larger change, that requires replacing the loader. I will try to break it into to smaller meaningful patches.

dhil avatar May 07 '19 17:05 dhil

Sounds good. I think we definitely want to avoid having imports be hidden away in non-toplevel declarations, so that the imports of a file are statically defined.

jamescheney avatar May 07 '19 17:05 jamescheney

Snoozing until post-0.9.0

jamescheney avatar Jul 23 '19 13:07 jamescheney

Last year we deferred resolving this to after release 0.9.0. As of now, the first two examples described by Daniel (IMO correctly) yield an "unbound module" error, while the third one where we import Examples.Foo does not parse (presumably, because we are not supporting Java-style directory <-> package punning).

It seems like this issue no longer describes a defect in the current version of Links. Is there still an underlying issue here to be resolved, and if so what is it?

jamescheney avatar Jun 12 '20 14:06 jamescheney

As far as I am aware this issue is still present in master. I have a fix ready but it is currently blocked by another patch which is blocked by the on-going types refactor.

dhil avatar Jun 17 '20 14:06 dhil

OK, we should revisit this after the types refactor is complete.

jamescheney avatar Jun 17 '20 14:06 jamescheney