breadboard icon indicating copy to clipboard operation
breadboard copied to clipboard

[FR] Prevent wiring of incompatible types

Open paullewis opened this issue 1 year ago • 18 comments

Per an earlier in-person discussion, we should limit your ability to connect – say – number ports to string ports. FWIW I think we should incorporate behaviors and formats here, too.

So, for example, object to object wouldn't work if it were object -> LLM Content -> Any to object -> [no behavior]. But things could be interesting if you did object -> LLM Content -> image-webcam to object -> LLM Content -> Any, because the latter is a superset of the former.

Same probably goes for array, where the item type, behavior(s) and formats should probably match.

If we do this at the API levels I can do a follow-up on the UI side to prevent folks from accidentally trying to wire things that are incompatible.

paullewis avatar May 09 '24 16:05 paullewis

I am going to rely on @aomarks' work in @breadboard-ai/build to guide me.

dglazkov avatar May 10 '24 23:05 dglazkov

Should we do automatic type coercion, like allow "number" -> "string"?

dglazkov avatar May 10 '24 23:05 dglazkov

I am thinking something like adding an InspectablePortType with a method that takes another InspectablePortType as argument and returns true if it can connect to it or false if it can not. This method will be directional: port a -> port b may be ok, but port b -> port not.

dglazkov avatar May 10 '24 23:05 dglazkov

Should we do automatic type coercion, like allow "number" -> "string"?

My feeling is no, at least not initially. I could see it leading to a class of bugs that are hard to debug.

paullewis avatar May 11 '24 12:05 paullewis

Can we use JSON schema validation for all runtime type checking?

And agree, no automatic coercion. I think nodes that want to accept many types and do their own conversion can (e.g. template). We could also make a coerce node, or a library of them, to let users coerce.

aomarks avatar May 11 '24 17:05 aomarks

JSON schema validation

Yes, as much as possible.

I was thinking about this concept of "type breadth". Some types are very broad, like "unknown". Some types are very narrow, like "number".

There's a spectrum here, since the LLM Content can be broader (accept all types) and narrower (accept only some).

Wires can go from narrower to broader, but not the other way around.

Would love your thoughts on that.

dglazkov avatar May 11 '24 17:05 dglazkov

I was thinking about this concept of "type breadth". Some types are very broad, like "unknown". Some types are very narrow, like "number".

Yep. I think JSON schema handles this out of the box, and it's compatible with how TypeScript does validation. A narrow value will validate with a compatible broader schema (e.g. 123 will validate against "string" | "number").

Our "unknown" is expanded out to "null" | "boolean" | "object" | "array" | "number" | "integer" | "string"; so any JSON serializable value.

So, I think there shouldn't be anything extra we need to handle. Standard JSON schema validation + good schemas should do all the work. I suspect if there's a case where some value should be able to flow to some port, but JSON schema says it can't, then that probably just means that the receiving port's schema is too strict.

aomarks avatar May 11 '24 17:05 aomarks

The only quirk we have is that we deviated from JSON schema with behavior tags.

dglazkov avatar May 11 '24 18:05 dglazkov

The only quirk we have is that we deviated from JSON schema with behavior tags.

Mm, right. Do any behaviors affect validation?

If not, the only problem might be the JSON validator being upset about an unexpected behavior keyword, in which case we can just pre-process the schemas to remove them before passing to validation.

If they do affect validation in some cases, we should think about how to encode the constraint with JSON schema, or if it's too special/weird, we could have a secondary validation step that's just for our extra concepts?

aomarks avatar May 11 '24 18:05 aomarks

we should think about how to encode the constraint with JSON schema, or if it's too special/weird, we could have a secondary validation step that's just for our extra concepts?

That's probably the path forward, yup.

Oh also! This bug is about design-time validation. We do need a bug for runtime validation, too.

dglazkov avatar May 11 '24 18:05 dglazkov

There's now anInspectablePort.type property that returns an InspectablePortType that allows you to check if you can connect a node to another type with canConnect method.

https://breadboard-ai.github.io/breadboard/docs/inspector/graph/#ports

See if this works, and please let me know. I want to integrate it into the editor API as well, so that canAddEdge or canChangeEdge error out.

dglazkov avatar May 14 '24 22:05 dglazkov

I focused on ad-hoc wires today, but I'm hoping to look at this tomorrow

paullewis avatar May 15 '24 14:05 paullewis

Okay, so I've had a look at the data coming through to the graph and all the InspectablePorts I get through have undefined .type properties. Is there something I need to call to get the data populated?

paullewis avatar May 16 '24 12:05 paullewis

A quick copy-paste of what DevTools emits for one of our ports:

{
    "name": "schema",
    "title": "schema",
    "configured": true,
    "value": {
        "type": "object",
        "properties": {
            "context": {
                "type": "string",
                "title": "out"
            }
        }
    },
    "star": false,
    "edges": [],
    "status": "connected",
    "schema": {
        "type": "object",
        "behavior": [
            "json-schema",
            "ports-spec",
            "config"
        ]
    }
}

paullewis avatar May 16 '24 12:05 paullewis

Oh. That's because I haven't connected it 🤦🏻

dglazkov avatar May 17 '24 03:05 dglazkov

#1838 should make type shine.

dglazkov avatar May 17 '24 04:05 dglazkov

I was going to land the first pass of this, but then I realized that it breaks for the specialists. That seems to be because the context in is an array of LLM Content...

{
    "name": "in",
    "title": "Context in",
    "configured": false,
    "star": false,
    "type": {
        "schema": {
            "title": "Context in",
            "description": "The source material for the worker",
            "type": "array",
            "items": {
                "type": "object",
                "behavior": [
                    "llm-content"
                ]
            },
            "examples": [
            ]
        }
    }
}

But the context out is presenting as a string:

{
    "name": "out",
    "title": "Context out",
    "configured": false,
    "star": false,
    "type": {
        "schema": {
            "title": "Context out",
            "type": "string"
        }
    }
}

I didn't want to land #1844 with that as the case because I figured, while correct, it would be frustrating.

paullewis avatar May 17 '24 13:05 paullewis

Plan here is:

  1. https://github.com/breadboard-ai/breadboard/issues/2298
  2. Add a setting (per-board?) that prevents you from wiring incompatible schemas, defaulting off.
  3. Change the default to on after all our boards are looking good.

aomarks avatar Jun 25 '24 20:06 aomarks