smithy4s Render a mapping of shapeID -> scala FQN into metadata

Suggested in https://github.com/disneystreaming/smithy4s/issues/1364#issuecomment-1893286008 - render a metadata value that contains a mapping of the shape's original ID to its generated class/trait.

This will make it feasible to implement additional code generation on top of (well, next to) what smithy4s generates.

Implementation notes: we can probably reuse the existing metadata

$version: "2.0"

metadata smithy4sGenerated = [
  {
    smithy4sVersion: "0.18.9",
    namespaces: ["alloy", "smithy.waiters", "alloy.common", "smithy.api"]
  }
]

by adding an optional (for backward compat) entry for the shape mapping:

$version: "2.0"

metadata smithy4sGenerated = [
  {
    smithy4sVersion: "0.18.9",
    namespaces: ["alloy", "smithy.waiters", "alloy.common", "smithy.api"],
    shapes: {
      "smithy.api#documentation": "smithy.api.Documentation",
      "smithy.api#http": "smithy.api.Http"
    }
  }
]

@Baccata does this work for you?

Mar 11 '24 21:03 kubukoz

Yup

Mar 12 '24 06:03 Baccata

I wonder how large this metadata entry is gonna get in practice. The nice thing is that it's in a file of its own, in the jar, that almost nobody looks at.

Mar 12 '24 20:03 daddykotex

I brainstormed this with @zainab-ali and @majk-p, we were wondering if this is really a necessity. It appears that it depends on what the usecases really are.

For example, if we only need a way to derive a scala type's name from a ShapeId, we could expose the relevant function from codegen and achieve this with very few changes, no additions to the generated Smithy files, quickly and efficiently.

However, if the goal is to have an actual set of shapeIds in there (e.g. as a possible future replacement of namespaces), or to generate a StaticSchemaIndex of sorts, then we need at least the "key" side of things in that smithy4sGenerated entry.

From my side, the former would suffice.

Another problem is: in order to see the shapes in metadata, one has to invoke the Smithy4s codegen first and then load the generated Smithy files (or even hand-pick the smithy4s-generated.smithy file, which isn't that bad either). This isn't true if we have a public function in the codegen module: whatever "additional codegen" happens, it can run independently from smithy4s's codegen, which I would say is a plus (although not a game changer).

To sum up, I suggest that we take a different approach and split this:

Provide a mapping function, e.g. def toScalaRef(shapeId: ShapeId): List[String]
Provide the set of shapeIds that were generated, in the metadata:

$version: "2.0"

metadata smithy4sGenerated = [
  {
    smithy4sVersion: "0.18.9",
    namespaces: ["alloy", "smithy.waiters", "alloy.common", "smithy.api"],
    shapes: [
      "smithy.api#documentation",
      "smithy.api#http"
    ]
  }
]

This will be more space / smithy-parsing-efficient (only the keys have to be part of the metadata), and can be done incrementally. In addition, the tools that don't need the set of shapeIds don't have to bother loading the model after code generation.

Thoughts?

Mar 13 '24 14:03 kubukoz

No thoughts, except maybe why do you want to store the shapes at all, in the metadata ? I think the assumption that everything was generated in a model is reasonably safe.

Anyway, I don't have much thoughts right now. Show me some code (even incomplete) and we'll talk.

Mar 13 '24 16:03 Baccata

why do you want to store the shapes at all, in the metadata

good point, if you see the metadata you likely see the rest of the model too.

Mar 13 '24 22:03 kubukoz