klapaucius
klapaucius copied to clipboard
Routing and types
Type handling in Klapaucius
(a reconciliation; also a draft)
All items that appear in Push code (including Push programs and code blocks) are strictly typed. This affects the interpreter's behavior in a number of crucial ways, and also informs the design of new types. However, the handling of type checking and safety is (to say the least) unusual in the Push interpreter.
Items, including Push code, are always stored in stacks.
The interpreter stores the program being run in a special stack called :exec
.
In each step of the interpreter, it pops the top item from the :exec
stack, and acts according to that item's type:
- If it is a Clojure
seq
(a code block), then all of its contents are "unwrapped" and put back (in the original order) onto:exec
. - If it is a Clojure
keyword
, either of two things will happen:- If the keyword is a recognized stack or binding name, then the top item on that stack (if any) is duplicated onto the
:exec
stack, without being deleted from the original location. Think of this as "peek". Note: under certain runtime circumstances, the interpreter may intentionally fail to "recognize" stack and binding names in this step. If so, the item passes through the logic below. - If the keyword is a recognized instruction name, then that instruction's code is executed immediately.
- If the keyword is not recognized, it is sent to the
:ref
stack.
- If the keyword is a recognized stack or binding name, then the top item on that stack (if any) is duplicated onto the
- Otherwise, the item is assumed to be a Push literal of some kind, and is sent to the core router table for handling.
The core router table is an ordered collection of rules. Each rule includes a predicate (which recognizes a type of item, typically), a destination (a stack name, typically), and a preprocessor (a function applied to the item being routed before it is sent). Items are preprocessed and sent to a stack (or binding) by the first rule whose predicate is triggered by that item.
Finally, each stack (or binding) has a collection of associated filters. These typically include a stack size filter and a type filter, but may also include other constraint-handling functions. When a stack's size filter detects an item received would make the stack oversized if added, then the item is discarded and an :error
item is produced, and sent back to the core router. If a stack's type filter (or some other constraint filter) detects the item received is a bad match, then the item is sent back to the core router.
As you can imagine, the graph of routes in the core router, plus the routes associated with all the stack and binding filters, should form a directed acyclic graph to avoid infinite route loops.
Instructions, as a rule, do not check the type of items they consume or produce. Rather, they specify locations (stacks and bindings) from which arguments are taken. Responsibility for type correctness in instructions is therefore distributed between the core router and the filters associated with every stack and binding. It is not, in general, handled by instruction code itself.
This makes Push unusual in several ways, but it can aid those trying to add new domain-specific types and instructions to the Push language. To add a new type,
- build a core routing rule, which must include a type-specific recognizer predicate, a destination stack (which may be an existing stack), and any preprocessor that might be required;
- insert the routing rule in a carefully-chosen location in the core router table;
- add filter rules to the type's destination stack to handle type and size checking;
- write all new instructions only using the Push DSL, taking their arguments from the appropriate stacks, and sending them to the most specific stacks possible.
For example, if you wanted to add a :counting-number
type to Klapaucius, you might construct a predicate to recognize numbers that are integers and strictly positive. You could use that predicate to produce a routing rule which sent your values to the :counting-number
stack. You would need to insert that rule into the core router before the one that recognizes :integer
items, because it is more specific.
If you were to write new :counting-number
instructions, for example one that subtracted two :counting-number
items, you would still send the result to the :counting-number
stack, despite the fact that in some cases results like (- 3 9)
will no longer be valid :counting-number
items. The filter on the :counting-number
stack will reject a zero or negative value, and send it to the core router for processing. There, it will be seen as an :integer
, and sent to the stack associated with that predicate instead.
(Note though that the choice of source and destination stacks for the arguments and results of instructions should always communicate as closely as possible the intention of the function entailed, so that automated systems elsewhere in the Push interpreter can infer the type signatures of instructions. It may be tempting to send all results of instructions to the :exec
stack, so that the core router can send them to the "correct" destination, but this obfuscates the process of "instruction composition" and pattern discovery. So try to be as specific as possible, using re-routing as a failsafe. That said, it may be the case (for instance when dividing numbers) that a result might be an :error
rather than a numeric value. In these cases, the error can be sent to the "expected" result stack, where it will be rerouted correctly).
Core router table
Whenever any item is routed by the core router, the first predicate to fire in the list below (in this order) will send the item to that address.
A seq
(code block) is routed to :exec
immediately, and is pushed there as received. The process of "unwrapping" code blocks is an active one, not part of routing.
Similarly, a Clojure keyword
will be routed to :exec
, regardless of whether it represents a :ref
, a :type
, a :stackname
or an :instruction
. Discriminating those items from one another is the job of the interpreter itself, not the routing process. A keyword
item can end up being sent to a stack by an instruction's code, but it is not by default routed to one.
However, a QuotedCode
item will be preprocessed (unquoted) before being pushed to the :code
stack. (There is no ":quoted
" stack.)
All items sent to an address might end up being re-routed by the receiving address, too!
type predicate address preprocessor
:seq seq? :exec
:keyword keyword? :exec
:error error? :error
:integer integer? :scalar
:float float? :scalar
:rational rational? :scalar
:number number? :scalar
[NaN] nan? :scalar
:set set? :set
:snapshot snapshot? :snapshot
:complex complex? :complex
:boolean boolean? :boolean
:quoted quoted-code? :code :value
:char char? :char
:string string? :string
:tagspace tagspace? :tagspace
[vectorized] vector-of-type? [:plural]
:generator generator? :generator
:interval interval? :interval
:vector vector? :vector
;; catch-alls:
anything true :code
Adding new types
New types must inserted a predicate and address into the core router table, unless they want to be sorted as :code
. They should also attach a filter to their stacks (if any).
For example, a :vector-tree
type, which contained items and :vector
collections, would need to have a predicate to differentiate it from a generic :vector
, and would have to appear before the :vector
line.
An :even-integer
type would have to appear in the router list before the more general :integer
router.
The :matrix
type (to be added shortly) may end up being a kind of :vector
, but depending on the final libraries included, it might not. It's unclear whether a :scalars
or :complexes
vector will therefore end up being a type of :matrix
as well. If every :matrix
is "only" a :vector
(of vectors, with size and type constraints), then its recognizer line would only have to appear before :vector
. If on the other hand each :scalars
or :complexes
vector should be understood to be a row (or column) matrix, then the :matrix
recognizer should appear before those lines as well.
Most new type literals should end up being defined as record
items, with type-specific predicates added as needed. Be careful with structural typing predicates, since they may accidentally capture unwanted items if they're out of order in the core router, or too general in scope.
Promiscuous routing
Certain circumstances may arise at runtime in which an item is routed to every triggered address.
For instance, a copy of [false true]
would be routed in these circumstances to each of :booleans
, :vector
, and :code
.
Note that this might have unexpected consequences: The value 9
triggers integer?
but also rational?
, number?
, and complex?
. And also will end up on :code
.
Some examples:
- a
seq
-> :exec, where it is pushed as received (not unwrapped) - the keyword
:boolean
->:exec
- the keyword
:boolean-and
->:exec
- a
QuotedCode
item -> :code, but only its:value
is pushed, not theQuotedCode
item itself - a
vector
that has only integers in it -> :scalars - a
vector
with someboolean
and some nestedvector
items ->:vector
- a
Complex
record ->:complex
- a
Color
record (when no:color
type or router has been defined) ->:code
Filtering:
Addresses labeled N/A
have no type filters, but do still have size filters, and may have other constraint filters.
stack filter
:boolean boolean?
:char char?
:code N/A
:complex complex/complex?
:error N/A
:exec N/A
:generator generator?
:interval interval?
:log N/A
:print N/A
:ref keyword?
:scalar number?
:set set?
:snapshot snapshot?
:string string?
:tagspace tagspace?
:unknown N/A
:vector vector?
[:plural] vector-of-type?
In the case where the :code
stack—which is the overall "sink" for unrecognized items---gets too large, it will emit :error
items like any other oversized item.
Interpreter behavior when the :error
stack gets oversized is undefined.
Changes to make in Klapaucius
- redefine
:string
as the vectorized form of:char
, not as a unique type - remove
:unknown
stack - make sure
:buildable
items are handled correctly - modify the interpreter loop's keyword-handler to manage
- bound
:ref
values, including stack names: looked up withpeek
, unless in quoting mode - bound instruction names: executed immediately
- unrecognized keyword sent to
:ref
- bound
- permit stack names to be used as
:ref
items - PushType items (including type signatures) as explicit items
- permit get/set by keyword in
buildable
items - the routing graph needs to be checked for loops
This may also provide a simplifying mechanism for combinator instructions. That is, instead of a million :x-dup
and :x-flipstack
instructions, a single case of each could accept a :ref
argument. If the referred-to stack is marked as movable
(the way types are now), then it can be restructured by any of these instructions. If it is not movable
(like :error
or :log
), then it will simply return an unchanged stack.