elm-ts-json icon indicating copy to clipboard operation
elm-ts-json copied to clipboard

Add a proof-of-concept for experiment to add types for recursive Codecs

Open dillonkearns opened this issue 2 years ago • 3 comments

I'm not sure if this functionality would be a good idea or not because there are tradeoffs involved, but I'm creating this PR as a starting point for the discussion to consider those tradeoffs.

Right now, TsJson.Codec.recursive and TsJson.Codec.recursive have untyped data (a generic JsonValue TypeScript type).

The code in this PR shows how you could add type information for TsJson.Codec.recursive through some clever tricks. However, it still comes with tradeoffs in the overall design.

Tradeoffs

It is possible to describe recursive types in TypeScript, as described in these release notes: https://www.typescriptlang.org/docs/handbook/release-notes/typescript-3-7.html#more-recursive-type-aliases The challenge is, the only way to define those recursive types is if they have a name. But elm-ts-json is generating anonymous types, such as string[], or string | null.

Take this example of a recursive TsJson Codec, for example:

TsCodec.recursive
    (\c ->
        TsCodec.custom Nothing
            (\fempty fcons value ->
                case value of
                    [] ->
                        fempty

                    x :: xs ->
                        fcons x xs
            )
            |> TsCodec.variant0 "[]" []
            |> TsCodec.positionalVariant2 "(::)" (::) TsCodec.int c
            |> TsCodec.buildCustom
    )

It currently produces this TypeScript type: { args : [ number, JsonValue ]; tag : "(::)" } | { tag : "[]" }.

So in order to get type information for recursive definitions like this, any generated code would need to go through any recursive type definitions (either top-level references to these kinds of types, or nested references), and then choose a name for that type and include that type declaration in the TypeScript file.

So right now, there's this function that the generated code uses https://package.elm-lang.org/packages/dillonkearns/elm-ts-json/latest/TsJson-Type#toTypeScript

toTypeScript : TsJson.Type.Type -> String

With the addition of recursive TypeScript types to that, it would no longer be possible to have TsJson.Type.Type -> String, but instead you would need to do something like:

TsJson.Type.Type -> { recursiveTypeDeclarations: Set ( String, String), output: String }

So for example, for the recursive example code above, it could return:

{
  recursiveTypeDeclarations: [ ("RecursiveTypeABC123",
                               """{ args : [ number, RecursiveTypeABC123 ]; tag : "(::)" } | { tag : "[]" }"""
                              ) ]
, output : "RecursiveTypeABC123"
}

Then the generated code would need to create those top-level recursive type declarations if there are any. That is possible, but it's a significant change, and it adds some complexity to the idea of a type since it is no longer self-contained but may also depend on some additional type declarations.

dillonkearns avatar Dec 06 '22 16:12 dillonkearns

[...] it adds some complexity to the idea of a type since it is no longer self-contained but may also depend on some additional type declarations.

I'm sidetracking a bit here but I believe that we would benefit from having all sub-types declared in ts actually.

For instance, I've some Result types in my ToElm and I've got some ts function declaration that end up looking like this:

async function getReadResult(path: string): Promise<{ tag: "Ok"; args: [string] } | { tag: "Err"; args: [string] }> {

Where it would be more readable and less error prone to have:

async function getReadResult(path: string): Promise<Result<string,string>> {

If your Result encoder was to evolve you would get more explicit and precise messaging from ts. Overall, having sub-types declared in ts would probably make cleaner typescript code down the line.

staeter avatar Dec 06 '22 17:12 staeter

Hey @staeter, thanks for the discussion. So the idea of using named types brings up a lot of questions and could be quite large in scope.

For example, where do the names come from? Does the user define them with the elm-ts-json API? Does every type get a name? Should elm-ts-interop go and find anything that uses a given type and see if it is named to reference it by that name? Could that introduce any performance bottlenecks? What if you define multiple type names for the same type, should it pick one or define the same type more than once? Or does it only use that type name if you reference the same decoder/encoder? Would that result in an intuitive experience for users?

So while having named types is a great practice in TypeScript, in this context it could introduce a lot of complexity for both users and for elm-ts-interop.

dillonkearns avatar Dec 06 '22 17:12 dillonkearns

I might miss some edge case here but in the same way custom types are listed with TsC.namedVariant1 "Variant" Variant the de/encoded type could be set with one more parameter to TsC.custom.

type MyDataType
   = VariantA Int
   | VariantB String

codec : TsC.Codec MyDataType
codec =
    TsC.custom "MyDataType" (Just "tag")
            (\varA varB value ->
                case value of
                    VariantA int->
                        varA int
    
                    VariantB str ->
                        varB str
            )
            |> TsC.namedVariant1 "VariantA" VariantA ( "int", TsC.int )
            |> TsC.namedVariant1 "VariantB" VariantB ( "str", TsC.string )
            |> TsC.buildCustom
boolCodec : TsC.Codec Bool
boolCodec =
    TsC.stringUnion "Bool"
        [ ( "True", True )
        , ( "False", False )
        ]

This is even better with records/objects :

type alias Point =
    { x : Float, y : Float }

codec : TsC.Codec Point
codec =
    TsC.object "Point" Point
            |> TsC.field "x" .x TsC.float
            |> TsC.field "y" .y TsC.float
            |> TsC.buildObject

From there if you have a ToElm like this :

type ToElm
    = MyData MyDatatype
    | MyBool Bool
    | MyPoint Point

codec =
    TsC.custom "ToElm" (Just "tag")
            (\myData myBool myPoint value ->
                case value of
                    MyData data->
                        mydata data
    
                    MyBool b ->
                        byBool b

                    MyPoint p ->
                        myPoint p
            )
            |> TsC.namedVariant1 "MyData" MyData ( "data", MyDataType.codec )
            |> TsC.namedVariant1 "MyBool" MyBool ( "bool", boolCodec )
            |> TsC.namedVariant1 "MyPoint" MyPoint ( "point", Point.codec )
            |> TsC.buildCustom

It would give this in ts :

export type ToElm = { tag : "MyData"; data : MyDataType } | { tag : "MyBool"; data : Bool } | { tag : "MyPoint"; data : Point };

export type MyDataType = { tag : "VariantA"; int : int } | { tag : "VariantB"; str : string };
export type Bool = "True" | "False";
export type Point = { x : float; y : float };

I believe the only error this design could create is if you give the same name to two types but you can already hit that issue if you give the same string to two variant. @dillonkearns I think you're right, to detect and avoid that kind of error would be quite complex.

staeter avatar Dec 06 '22 18:12 staeter