klapaucius icon indicating copy to clipboard operation
klapaucius copied to clipboard

Routing and types

Open Vaguery opened this issue 6 years ago • 1 comments

Type handling in Klapaucius

(a reconciliation; also a draft)

All items that appear in Push code (including Push programs and code blocks) are strictly typed. This affects the interpreter's behavior in a number of crucial ways, and also informs the design of new types. However, the handling of type checking and safety is (to say the least) unusual in the Push interpreter.

Items, including Push code, are always stored in stacks.

The interpreter stores the program being run in a special stack called :exec.

In each step of the interpreter, it pops the top item from the :exec stack, and acts according to that item's type:

  • If it is a Clojure seq (a code block), then all of its contents are "unwrapped" and put back (in the original order) onto :exec.
  • If it is a Clojure keyword, either of two things will happen:
    • If the keyword is a recognized stack or binding name, then the top item on that stack (if any) is duplicated onto the :exec stack, without being deleted from the original location. Think of this as "peek". Note: under certain runtime circumstances, the interpreter may intentionally fail to "recognize" stack and binding names in this step. If so, the item passes through the logic below.
    • If the keyword is a recognized instruction name, then that instruction's code is executed immediately.
    • If the keyword is not recognized, it is sent to the :ref stack.
  • Otherwise, the item is assumed to be a Push literal of some kind, and is sent to the core router table for handling.

The core router table is an ordered collection of rules. Each rule includes a predicate (which recognizes a type of item, typically), a destination (a stack name, typically), and a preprocessor (a function applied to the item being routed before it is sent). Items are preprocessed and sent to a stack (or binding) by the first rule whose predicate is triggered by that item.

Finally, each stack (or binding) has a collection of associated filters. These typically include a stack size filter and a type filter, but may also include other constraint-handling functions. When a stack's size filter detects an item received would make the stack oversized if added, then the item is discarded and an :error item is produced, and sent back to the core router. If a stack's type filter (or some other constraint filter) detects the item received is a bad match, then the item is sent back to the core router.

As you can imagine, the graph of routes in the core router, plus the routes associated with all the stack and binding filters, should form a directed acyclic graph to avoid infinite route loops.

Instructions, as a rule, do not check the type of items they consume or produce. Rather, they specify locations (stacks and bindings) from which arguments are taken. Responsibility for type correctness in instructions is therefore distributed between the core router and the filters associated with every stack and binding. It is not, in general, handled by instruction code itself.

This makes Push unusual in several ways, but it can aid those trying to add new domain-specific types and instructions to the Push language. To add a new type,

  1. build a core routing rule, which must include a type-specific recognizer predicate, a destination stack (which may be an existing stack), and any preprocessor that might be required;
  2. insert the routing rule in a carefully-chosen location in the core router table;
  3. add filter rules to the type's destination stack to handle type and size checking;
  4. write all new instructions only using the Push DSL, taking their arguments from the appropriate stacks, and sending them to the most specific stacks possible.

For example, if you wanted to add a :counting-number type to Klapaucius, you might construct a predicate to recognize numbers that are integers and strictly positive. You could use that predicate to produce a routing rule which sent your values to the :counting-number stack. You would need to insert that rule into the core router before the one that recognizes :integer items, because it is more specific.

If you were to write new :counting-number instructions, for example one that subtracted two :counting-number items, you would still send the result to the :counting-number stack, despite the fact that in some cases results like (- 3 9) will no longer be valid :counting-number items. The filter on the :counting-number stack will reject a zero or negative value, and send it to the core router for processing. There, it will be seen as an :integer, and sent to the stack associated with that predicate instead.

(Note though that the choice of source and destination stacks for the arguments and results of instructions should always communicate as closely as possible the intention of the function entailed, so that automated systems elsewhere in the Push interpreter can infer the type signatures of instructions. It may be tempting to send all results of instructions to the :exec stack, so that the core router can send them to the "correct" destination, but this obfuscates the process of "instruction composition" and pattern discovery. So try to be as specific as possible, using re-routing as a failsafe. That said, it may be the case (for instance when dividing numbers) that a result might be an :error rather than a numeric value. In these cases, the error can be sent to the "expected" result stack, where it will be rerouted correctly).

Core router table

Whenever any item is routed by the core router, the first predicate to fire in the list below (in this order) will send the item to that address.

A seq (code block) is routed to :exec immediately, and is pushed there as received. The process of "unwrapping" code blocks is an active one, not part of routing.

Similarly, a Clojure keyword will be routed to :exec, regardless of whether it represents a :ref, a :type, a :stackname or an :instruction. Discriminating those items from one another is the job of the interpreter itself, not the routing process. A keyword item can end up being sent to a stack by an instruction's code, but it is not by default routed to one.

However, a QuotedCode item will be preprocessed (unquoted) before being pushed to the :code stack. (There is no ":quoted" stack.)

All items sent to an address might end up being re-routed by the receiving address, too!

type           predicate           address          preprocessor

:seq           seq?                :exec
:keyword       keyword?            :exec
:error         error?              :error
:integer       integer?            :scalar
:float         float?              :scalar
:rational      rational?           :scalar
:number        number?             :scalar
[NaN]          nan?                :scalar
:set           set?                :set
:snapshot      snapshot?           :snapshot
:complex       complex?            :complex
:boolean       boolean?            :boolean
:quoted        quoted-code?        :code            :value
:char          char?               :char
:string        string?             :string
:tagspace      tagspace?           :tagspace
[vectorized]   vector-of-type?     [:plural]
:generator     generator?          :generator
:interval      interval?           :interval
:vector        vector?             :vector
;; catch-alls:
anything       true                :code

Adding new types

New types must inserted a predicate and address into the core router table, unless they want to be sorted as :code. They should also attach a filter to their stacks (if any).

For example, a :vector-tree type, which contained items and :vector collections, would need to have a predicate to differentiate it from a generic :vector, and would have to appear before the :vector line.

An :even-integer type would have to appear in the router list before the more general :integer router.

The :matrix type (to be added shortly) may end up being a kind of :vector, but depending on the final libraries included, it might not. It's unclear whether a :scalars or :complexes vector will therefore end up being a type of :matrix as well. If every :matrix is "only" a :vector (of vectors, with size and type constraints), then its recognizer line would only have to appear before :vector. If on the other hand each :scalars or :complexes vector should be understood to be a row (or column) matrix, then the :matrix recognizer should appear before those lines as well.

Most new type literals should end up being defined as record items, with type-specific predicates added as needed. Be careful with structural typing predicates, since they may accidentally capture unwanted items if they're out of order in the core router, or too general in scope.

Promiscuous routing

Certain circumstances may arise at runtime in which an item is routed to every triggered address.

For instance, a copy of [false true] would be routed in these circumstances to each of :booleans, :vector, and :code.

Note that this might have unexpected consequences: The value 9 triggers integer? but also rational?, number?, and complex?. And also will end up on :code.

Some examples:

  • a seq -> :exec, where it is pushed as received (not unwrapped)
  • the keyword :boolean -> :exec
  • the keyword :boolean-and -> :exec
  • a QuotedCode item -> :code, but only its :value is pushed, not the QuotedCode item itself
  • a vector that has only integers in it -> :scalars
  • a vector with some boolean and some nested vector items -> :vector
  • a Complex record -> :complex
  • a Color record (when no :color type or router has been defined) -> :code

Filtering:

Addresses labeled N/A have no type filters, but do still have size filters, and may have other constraint filters.

stack        filter

:boolean     boolean?
:char        char?
:code        N/A
:complex     complex/complex?
:error       N/A
:exec        N/A
:generator   generator?
:interval    interval?
:log         N/A
:print       N/A
:ref         keyword?
:scalar      number?
:set         set?
:snapshot    snapshot?
:string      string?
:tagspace    tagspace?
:unknown     N/A
:vector      vector?
[:plural]    vector-of-type?

In the case where the :code stack—which is the overall "sink" for unrecognized items---gets too large, it will emit :error items like any other oversized item.

Interpreter behavior when the :error stack gets oversized is undefined.

Changes to make in Klapaucius

  • redefine :string as the vectorized form of :char, not as a unique type
  • remove :unknown stack
  • make sure :buildable items are handled correctly
  • modify the interpreter loop's keyword-handler to manage
    • bound :ref values, including stack names: looked up with peek, unless in quoting mode
    • bound instruction names: executed immediately
    • unrecognized keyword sent to :ref
  • permit stack names to be used as :ref items
  • PushType items (including type signatures) as explicit items
  • permit get/set by keyword in buildable items
  • the routing graph needs to be checked for loops

Vaguery avatar Mar 06 '18 13:03 Vaguery

This may also provide a simplifying mechanism for combinator instructions. That is, instead of a million :x-dup and :x-flipstack instructions, a single case of each could accept a :ref argument. If the referred-to stack is marked as movable (the way types are now), then it can be restructured by any of these instructions. If it is not movable (like :error or :log), then it will simply return an unchanged stack.

Vaguery avatar Mar 06 '18 17:03 Vaguery