ts-json-schema-generator icon indicating copy to clipboard operation
ts-json-schema-generator copied to clipboard

Multiple Instances of a type name

Open sramam opened this issue 4 years ago • 4 comments

Past issues indicate this is a recurring problem for others (and I!).

Any fix will have to be a bit intrusive and involves some opinionated design choices. Hopefully this issue will serve to discuss possible solutions.

Occurrence of "multiple instances" in a schema

Multiple instances in the schema occur when the exported name of a type is identical. This can happen when the same TypeName is exported from two different files. There are three cases of this in valid TypeScript programs that I have encountered:

Case 1: Chained inheritance:

// base.ts
export interface MyObject {
    a: string;
}
// intermediate.ts
import * as Base from "./Base";
export interface MyObject extends Base.MyObject {
    b: string;
}
// main.ts
import * as Intermediate from "./Intermediate";
export interface MyObject extends Intermediate.MyObject {}

Case 2: Composition:

// ComponentA.ts
export interface MyObject {
    a: string;
}
// ComponentB.ts
export interface MyObject {
    b: string;
}
// main.ts
import * as A from "./componentA";
import * as B from "./componentB";

export interface MyObject {
    a: A.MyObject;
    b: B.MyObject;
}

Case 3:

The duplicates test case. This is in principle the same as case 2, but a different example, so worth testing for.

Root cause

These are all valid TypeScript programs, that should have valid generatable JSON-schemas, but our favorite schema-generator barfs. The best I can understand, it's because the generator stores the Type as the "name" in the file it is defined, and loses the context of the file path. Within the TypeScript AST these are independent nodes, bound to a sourceFile, allowing for disambiguation when necessary. Since the Type constructors do not store the node, we lose this ability at the point of generation.

Importantly, we only need the "fully qualified name" in case of a conflict. The "simple name" should suffice in the vast majority of case.

Possible Solution:

(References the POC implementation)

  1. Use getId() instead of getName() to generate all references initially - DefinitionTypeFormatter & ReferenceTypeFormatter
  2. Build a schema using these, but also create an idNameMap, which uses maps the id to it's unambigiousName
  3. The unambiguousName is identical to getName() when there is no conflict, and uses the smallest possible prefix computed from sourceFileName deltas between all collisions. RootTypes grab the getName().
  4. The schema is constructed as before, removing undefined and unreachable definitions. Once done, a resolveIdRefs recursive walk uses the idNameMap to fix the schema up.

(if this sounds complicated - a proof-of-concept PR is coming right behind the issue being filed)

Opinionated parts:

  1. Disambiguation segment: This should be the smallest possible string that allows for proper disambiguation and makes sense to the author/users of the TypeScript-code/schema. One option would be to consider the import path that would be needed. However, many a time, this will include an trailing index.ts which is superfluous for our purpose. Given conflicting names, I'd like to propose removing the common-prefixes and any trailing index.ts to arrive at the disambiguation string.
  2. Path separator: since the json-schema and all related tooling is built around the json-ptr, using a "/" will cause all kinds of down-stream trouble in using these schemas. I'd like to propose using - which is URL safe, easy on the humans, and doesn't conflict with TypeScript variable naming conventions.

Examples:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$ref": "#/definitions/MyObject",
    "definitions": {
        "MyObject": {
            "type": "object",
            "required": [
                "a",
                "b",
                "c"
            ],
            "properties": {
                "a": {
                    "type": "string"
                },
                "b": {
                    "type": "string"
                },
                "c": {
                    "type": "string"
                }
            },
            "additionalProperties": false
        }
    }
}
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$ref": "#/definitions/MyObject",
    "definitions": {
        "MyObject": {
            "type": "object",
            "required": [
                "a",
                "b"
            ],
            "properties": {
                "a": {
                    "$ref": "#/definitions/componentA-MyObject"
                },
                "b": {
                    "$ref": "#/definitions/componentB-MyObject"
                }
            },
            "additionalProperties": false
        },
        "componentA-MyObject": {
            "type": "object",
            "required": [
                "a"
            ],
            "properties": {
                "a": {
                    "type": "string"
                }
            },
            "additionalProperties": false
        },
        "componentB-MyObject": {
            "type": "object",
            "required": [
                "b"
            ],
            "properties": {
                "b": {
                    "type": "string"
                }
            },
            "additionalProperties": false
        }
    }
}
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$ref": "#/definitions/MyType",
    "definitions": {
        "MyType": {
            "anyOf": [
                {
                    "$ref": "#/definitions/import1-A"
                },
                {
                    "$ref": "#/definitions/import2-A"
                }
            ]
        },
        "import1-A": {
            "type": "number"
        },
        "import2-A": {
            "type": "string"
        }
    }
}

Cons

  • This likely has a slight performance hit - since we walk the schema one more time as a post process step. But that is not different than the walk performed by removeUnreachable.

Pros

  • We'll generate schemas for a larger subset of TypeScript programs.
  • Since we bind to the filename, reuse of definitions should work irrespective of how they are aliased at point of use in the TypeScript files

sramam avatar Feb 17 '21 11:02 sramam

Has there been any progress on something like this? The library is not usable for large projects because simple enums/aliases with common names like Data, Result, and other common strings are "taken".

Sometimes it's possible to consolidate, but not always.

What is required to make progress on this?

kaspar-p avatar Jun 17 '25 20:06 kaspar-p

Saying it's unusable is not correct. I use it for Vega-Lite and mosaic. For the latter, the json schema is like 8mb so pretty huge.

But I agree that it would be nice to not choke on duplicates. Unfortunately, json schema has a flat namespace. So the only way I see would be to add some prefix/postfix to names when there are duplicates. I don't know exactly what that would entail and you'd need to dig into the code base yourself.

domoritz avatar Jun 17 '25 21:06 domoritz

Yep, sorry for the hyperbole. It's unusable on my personal big repo :).

I will take a look at how I might add src information as a prefix.

kaspar-p avatar Jun 17 '25 21:06 kaspar-p

Could be src but could also just be a counter if you are okay with implementing some kind of duplicate detection. Prefixes of paths will work without that will make the schema ugly (which may be okay).

I actually thought we already had something like the former. Maybe I misremembered from another project or it was for anonymous types.

domoritz avatar Jun 17 '25 21:06 domoritz