alma icon indicating copy to clipboard operation
alma copied to clipboard

Stakes in the ground

Open masak opened this issue 1 year ago • 16 comments

It seems I write these issues these days instead of blog posts. Very well.

This issue is a self-closing one about the new spec I'm writing. Oh right, I'm writing a new spec, check it out! (At the time of writing, I'm up to about chapter 6 or so.)

Anyway, I realized that there were some "soft" topics I wanted to write about, that are more about design sensibilities and "taste" — stakes in the ground — than about the objective things that go in the spec itself. Subjective stuff, basically. But somehow rooted in reasoning, or at least I'd like to think so.

I'll stub out the subsequent sections as individual comments, and then fill them in. After that, I'll close this issue.

masak avatar Nov 12 '24 14:11 masak

Variable scope

Alma uses the my keyword to declare variables. Besides being the variable-declaring keyword of choice for both Raku and Perl, I also quietly dig its suitably down-to-earth directness. Whose variable? My variable! G'doy!

Seriously, though, I could've gone with let. Definitely my second favorite.

Languages like C, C++, Java, C# use the type (like int) in lieu of a dedicated variable-declaring keyword. That's fine, I guess, but it ties your language into being one of those "static" straitjacket languages, which Perl, Raku, and Alma are not. Alma chooses not Raku's "type second" approach (my Int $n), but instead TypeScript's "type after colon" approach (my n: Int).

Python and Ruby do not deign to have either keyword or type. They just assign to the new variable, and this assignment also causes it to be declared. I was going to say something dramatic, like "that is Wrong". But Python and Ruby are rather popular, and so clearly a language can reach a respectable level of popularity without explicitly declared variables. All I can really say with certainty is that either I'm wrong, or the unwashed masses are.

Ancient-enough versions of C only allowed you to declare local variables at the top of functions. This was, as I understand it, a kind of torturing the programmer to benefit the compiler writer: a function gets a stack frame of a certain size, a size which grows with each new variable declaration. By putting all the declarations at the top, one does not have to recalculate that size halfway through the function.

Which brings us to the travesty that is JavaScript's var declaration.

tbd

masak avatar Nov 12 '24 14:11 masak

Hoisting

The classical "folk explanation" of hoisting is pretty dumb. Not because it's super-incorrect — it actually tends to be helpful — but because it's one of those explanations that makes a big deal out of a simple thing, and then fails to explain stuff when it gets complicated.

Example of when it succeeds: you can call a function that is "not declared yet".

f();    // this call works

function f() {
    console.log("whoa!");
}

The reason this works, says the dumb explanation, is that the function f declaration gets "hoisted" (by its own petard, one presumes) and ends up physically at the top of its surrounding scope, so that the code effectively looks like this:

function f() {
    console.log("whoa!");
}

f();

Even though that's not what you wrote.

Example of when it fails: two functions can call each other.

function isEven(n) {
    return n === 0 || isOdd(n - 1);
}

function isOdd(n) {
    return isEven(n - 1);
}

There's no way to use the "hoisting" imagery in this case to show how the code gets rewritten, because you can't physically put two functions above each other.

tbd

masak avatar Nov 12 '24 14:11 masak

Initialization

tbd

masak avatar Nov 12 '24 14:11 masak

Thunking

tbd

masak avatar Nov 12 '24 14:11 masak

Type conversions

tbd

masak avatar Nov 12 '24 14:11 masak

Object model

tbd

xxx Every language has an object model that looks slightly different; C++, Perl 5/Moose, Python, JavaScript, C# — frustrating! Is there a "truth" about objects? a way to model them without falling prey to xkcd's there are 15 competing standards ?

xxx Narrowly deciding to use has as a field declarator keyword instead of my; mainly because a different set of annotations are used on fields. (Also because has ends up having a different scope than my; this, I believe, was a major reason in Raku as well.)

xxx Also, yes, having a declarator keyword at all feels better than just letting the identifier be the declarator (as in JS/TS and Python); same argument leads to case as a declarator for enums

masak avatar Nov 12 '24 14:11 masak

Generic functions

Functions are pretty powerful before we even get to generic functions. Here are the modular (but on-by-default) features you get in ordinary functions:

  • Optional parameters. A function call can succeed without the corresponding argument being passed. Whereas usually, a parameter is required, if it's declared as optional and then its corresponding argument is not passed, the parameter will have the value none.
  • Default expressions. Instead of binding to none, an optional parameter can be equipped with a default expression. If the corresponding argument is not passed, the default expression is evaluated, and the parameter is bound to the result.
  • Rest parameter. In the case of excess arguments being passed, these can be collected in a dedicated rest parameter. This parameter will always bind to an array, but the array may or may not be empty, and it doesn't have an element type by default.
  • Named parameters. Besides the usual ("positional") parameters, it's also possible to declare parameters named, meaning that they are passed as named arguments when the function is called. Named arguments do not have to be passed in the same order as the corresponding named parameters were declared. Named parameters can orthogonally be declared optional, having default expressions, and named rest parameters (binding to a Dict). They also interact orthogonally with the below features.
  • Call-by-name arguments. Discussed more in the comment below. The corresponding operand (which is an expression) is passed unevaluated. Accessing the parameter causes the expression to be evaluated (in the environment of the caller). The resulting value is not memoized; each new variable access causes a new, separate evaluation.
  • Lazy thunk arguments. The operand is passed unevaluated. The resulting value is memoized; after a result has been computed, it's saved and used as the result of subsequent variable accesses.
  • In/out/inout modifiers. Marking a parameter as @in (the default) means that the parameter passing is about the rvalue, and that assignments to the parameter are disallowed (and will fail at runtime if attempted). Marking a parameter as @out means that the parameter passing is about the lvalue; assignments to the parameter are allowed, and "write through" to the location referenced in the caller; variable accesses are not allowed. Marking a parameter as @inout means that it's about both the rvalue and the lvalue; assignments write through to the location, and variable accesses give the rvalue. In the case of @out and @inout, passing an argument which is not an lvalue results in an error.
  • Parameter types. xxx

tbd

  • Advice. xxx
  • Multifunctions/generic functions. xxx
  • Multimethods. xxx

To incorporate: this musing, which calls out the need for @cbn parameters in a language — and maybe a step beyond that is to have something like macros or operatives. HN discussion.

masak avatar Nov 12 '24 14:11 masak

Call-by-value vs call-by-name

In a pure, effect-free world, there's no observable difference between CBV and CBN. This is something that Paul Blain Levy points out in his dissertation. CBV and CBN arises as two different evaluation strategies.

But I recently found a paper where the CBV/CBN distinction is explained quite vividly: Grokking the sequent calculus.

xxx

masak avatar Nov 17 '24 14:11 masak

Macro expansion

tbd

masak avatar Dec 10 '24 14:12 masak

The billion dollar mistake

tbd (well, I mean — not planning to (re-)do the mistake itself; planning to write this out later)

masak avatar Jan 10 '25 06:01 masak

Lexing and parsing

tbd

xxx lexer-parser separation (vs scannerless), especially in the face of both lexer and parser being extensible

xxx Alma's general trend towards a a "pure" parse (away from Perl/Raku (!) and towards JavaScript, Dylan, Scheme) in which side effects are not fired when parsing e.g. a class

xxx: in particular, mention the problematic parsing of < when it's overloaded both for the less-than comparison operator, and for the start of a generic type argument

xxx the need for "abstract parsing" in quasis, and how that ties into LR parser generation

masak avatar Mar 06 '25 09:03 masak

Types (and static vs dynamic)

xxx Alma being at heart a dynamic language, but (moreso than Raku) friendly to the static point of view

xxx gradual typing (?)

xxx pluggable types; never requiring types in a valid program (as do, for example, type classes)

xxx one expression language vs two (one for terms and one for types); see https://matklad.github.io/2025/08/09/zigs-lovely-syntax.html#Everything-Is-an-Expression

xxx https://gbracha.blogspot.com/2018/10/reified-generics-search-for-cure.html shows how to think about generic types from an "optional type system" perspective

masak avatar Mar 07 '25 03:03 masak

Exceptions, next/last/redo, and effect handlers

xxx

masak avatar Mar 29 '25 09:03 masak

Foreign Function Interface (FFI)

xxx https://verdagon.dev/blog/fearless-ffi

masak avatar May 08 '25 08:05 masak

Whether everything is (or should be) an expression

xxx (no)

xxx https://craftinginterpreters.com/the-lox-language.html#design-note

masak avatar May 15 '25 01:05 masak

Enums

xxx https://graydon2.dreamwidth.org/253769.html is a good post about sum types/discriminated unions -- takeaway, in my opinion, is that you win from pushing the idea of "can't access the wrong variant" into both the type system and the runtime

xxx TypeScript is the closest to what I imagine I'd want for Alma -- specifically the so-called "flow typing", which allows conditional branching to refine the types variables in blocks

xxx more exactly, I'd like for Alma to be both (untyped) JavaScript, in that you can go ahead and write any code you like and try to run it, and TypeScript, in that you can start adding type annotations and get (IDE/tooling) errors and completion

xxx there's a representation decision I simply have not made yet: whether enums should be based on (symbol-like) tags, or whether they should be more like subclasses in a closed hierarchy

masak avatar Jul 23 '25 09:07 masak