External package resolution
I'd like to avoid giving the compiler special knowledge of the standard library. stdlib should be just another package. How do I make source code available to the compiler?
When you import a module in ipso, it currently uses the module name to search the module's directory for the correct source file. What changes need to be made so that many places can be searched for a particular module?
Environment Variable
Set some environment variable (say, IPSO_PKGS) to a list of directories. In addition to searching the module's directory, search these directories in order.
Problems
Transitive Dependencies
$ tree
.
|--- a
|--- a.ipso
|--- b
|--- b.ipso
|--- c
|--- main.ipso
$ cat a/a.ipso
value : Int
value = 99
$ cat b/b.ipso
import a
value : Int
value = a.value + 1
$ cd c
$ cat main.ipso
import b
main : IO ()
main = print b.value
$ IPSO_PKGS=$PWD/../a:$PWD/../b ipso main.ipso
100
As the author of b, I developed and tested b.ipso by depending on the a module we've listed here - the one with value = 99. I want to make sure that a user of b gets the correct a module, so that my value is 100.
As the user of b, I don't understand why I have to supply the dependencies of b. How do I know that my version of its dependencies are the ones intended by the author? Furthermore, if I depended on a single package that itself had many dependencies, it would be too cumbersome for me to satisfy them all.
A Diamond-ish Problem
$ cat one/a.ipso
value : Int
value = 0
$ cat two/a.ipso
value : Int
value = 100
$ cat three/b.ipso
import a # intended to be one/a.ipso
value : Int
value = a.value + 1 # intended to be 1
$ cat four/c.ipso
import a # intended to be two/a.ipso
import b
main : IO ()
main = print (a.value + b.value) # intended to be 101 in total
$ IPSO_PKGS=./one:./two:./three ipso four/c.ipso
???
With the simple IPSO_PKGS solution, it's hard to tell what this program will print.
If imports search IPSO_PKGS from left to right, then the answer will be 1 + (1 +1) = 3. This is because any import of the a module will be satisfied by one/a.ipso, which is the first entry in the environment variable.
Direct ways to avoid this outcome:
- Import absolute filepaths instead of abstract names
- This makes it harder to write reusable code, because absolute imports would prevent portability
- Disallow modules with the same name in different locations on
IPSO_PKGS- This is non-compositional, because it requires package authors to collaborate on module names to allow users to be able to use both packages at the same time.
Programming Language Theory?
Let's take a step back and first understand what packages, modules, and imports accomplish from a programming language perspective. Having unpacked that, it might be easier to construct a coherent general solution.
Imports define closures
We can think of source files (modules) as expressions in the programming language. In many languages, modules are (dependent) products. In other languages, such as Nix, naturally allow source files to represent any sort of value.
In a module, import a adds some value to the closure of this module under the name a. But which value? I think the default interpretation implies that there exists a context/heap, and importing selected the entry named a from the heap and adds it to the closure under the name a. That also fits with import a as b, which selects the heap entry a and adds it to the module's closure under the name b.
The purpose of a 'package', from the perspective of conventional package managers, is to define a heap from which its modules can reference values.
Imports are arguments
Another way to look at it is that imports declare arguments to the module.
This module
$ cat main.ipso
import a
main : IO ()
main = print a.value
has a module-shaped hole named a. Before we can check and execute it, we have to fill the hole.
$ cat <<EOF > a.ipso
value : Int
value = 0
EOF
$ ipso --module a=./a.ipso main.ipso
0
A combination of the two
import a
expresses a requirement named a and adds it to the module's closure when instantiated.
A package is a module tree paired with information about how to satisfy its dependencies. There's no need for a package/module distinction like in Haskell.
Q: is there need for a short hand, to locate a package from within the source file?
stdlib sketch and usage
$ tree stdlib
stdlib
+--- pkg.ipso
+--- functor.ipso
+--- applicative.ipso
+--- monad.ipso
I removed the following from the language reference, because the reference should reflect what language currently does.
Packages
A package is a directory that contains .ipso files.
my_package
├── a.ipso
├── b.ipso
└── b
├── c.ipso
└── d.ipso
└── e.ipso
Using Packages
Packages are made available to ipso via the IPSO_PACKAGES environment variable.
Example:
$ IPSO_PACKAGES=/path/to/a:/path/to/b ipso my_file.ipso
Making modules available via the command line seems like a good first step. I like the idea of ipso --module a=/path/to/file.ipso test.ipso providing a module called a to test.ipso.
Currently a file can import from its siblings:
./test.ipso
import a
main : IO ()
main = println <| debug a.value
./a.ipso
value : Int
value = 2
Using --module arguments, you could get the same behaviour with ipso --module a=./a.ipso test.ipso.
I wonder how this might work with "hierarchical modules" (#73). Would I be able to say --module a.b=./x.ipso to provide an a.b module? What would happen if I passed --module a=./x.ipso --module a.b=./y.ipso when x.ipso contained a definition named b? a.b would be ambiguous, referring to both the b definition in x.ipso and the entirety of the y.ipso module.
How could it work with the "diamon-ish problem" from earlier?
$ cat one/a.ipso
value : Int
value = 0
$ cat two/a.ipso
value : Int
value = 100
$ cat three/b.ipso
import a # intended to be one/a.ipso
value : Int
value = a.value + 1 # intended to be 1
$ cat four/c.ipso
import a # intended to be two/a.ipso
import b
main : IO ()
main = print (a.value + b.value) # intended to be 101 in total
$ IPSO_PKGS=./one:./two:./three ipso four/c.ipso
If I set up modules a and b correctly ipso --module a=two/a.ipso --module b=three/b.ipso four/c.ipso then I'm left with the question of how to satisfy three/b.ipso's dependency on a module called a.
One option: ipso --module a=two/a.ipso --module b=three/b.ipso --module b:a=one/a.ipso four/c.ipso. --module b:a=one/a.ipso says "within the scope of module b, the a module is one/a.ipso.
Some thoughts that are coming up at this point:
- Why think about all these options when we could do some kind of package-manager-y system with a package manifest? Maybe that would make things easier?
- As a regular user, I wouldn't want to pass any modules via the command line
- What if a program had a dependency tree that was too big to pass via the command line?
I think that starting with modules-via-command-line-args is a good reference point for all the other features that would make dependency management easier.
For example, if I ended up with some sort of "package manifest", I could talk about the of the dependency tree in terms of --module arguments, even if I didn't use them directly in the package system.
The following package structure would be equivalent to this --module set: --module a=/path/to/two.ipso --module b:a=/path/to/one/a.ipso --module b=/path/to/three/self.ipso.
/path/to/one/a.ipso
value : Int
value = 0
/path/to/two/a.ipso
value : Int
value = 100
/path/to/three/pkg.ipso
dependencies = [
{ name = "a", src = "file:///path/to/one/a.ipso" },
]
/path/to/three/self.ipso
import a # intended to be one/a.ipso
value : Int
value = a.value + 1 # intended to be 1
/path/to/four/pkg.ipso
dependencies = [
{ name = "a", src = "file:///path/to/two/a.ipso" },
{ name = "b", src = "file:///path/to/three" }
]
/path/to/four/self.ipso
import a # intended to be two/a.ipso
import b
main : IO ()
main = print (a.value + b.value) # intended to be 101 in total