typebox icon indicating copy to clipboard operation
typebox copied to clipboard

fix: correctly score nested unions

Open DemonHa opened this issue 8 months ago • 8 comments

✨ Summary

This PR improves the type scoring system to properly handle nested union types, which were previously ignored by the scoring system

Problem

Currently, the scoring algorithm does not account for union types that are nested within other unions or types. This results in inaccurate or misleading scores when comparing complex type structures.

Solution

This PR introduces a recursive scoring mechanism that traverses and evaluates all nested union types. Each branch of the union is now scored independently and factored into the final score, improving the overall accuracy and reliability of the scoring system.

closes #1268

DemonHa avatar Jun 20 '25 16:06 DemonHa

@DemonHa Hi, this is a good PR, thank you.

Can you just add a couple more test cases for this? Ideally I want to see tests for structures like.

Case A

type A = TUnion<[
  TUnion<[
    TObject<...>,
    TObject<...>
  ],
  TUnion<[
    TObject<...>,
    TObject<...>
  ]>
]>

Case B

type B = TUnion<[
  TObject<...>,
  TObject<...>,
  TUnion<[
    TObject<...>,
    TObject<...>
  ]>
]>

Also, just put a heading above these tests

// ------------------------------------------------------------------------
// ref: https://github.com/sinclairzx81/typebox/pull/1273
// ------------------------------------------------------------------------
it('should correctly score nested union types 1', () => {
  // test here
})
it('should correctly score nested union types 2', () => {
  // test here
})
it('should correctly score nested union types 3', () => {
  // test here
})

This just helps me cross reference this logic back to the original PR.


This PR looks good but given the nature of the update, just want some additional tests around it to assert the expected behavior. Drop in as many tests as you can for as many structures as you can think of. Once their in, can merge this one through.

Thanks again! S

sinclairzx81 avatar Jun 22 '25 04:06 sinclairzx81

@sinclairzx81 Just noticed another issue with the rewarding system

Scoring system is rewarding properties matches 1/propertyCount. This would result on incorrect result (see example below)

it.only('should correctly reward, () => {
  const A = Type.Union([
    Type.Object({
      prop1: Type.String(),
      prop2: Type.String(),
      prop3: Type.String(),
    }),
    Type.Object({
      prop1: Type.String(),
      prop2: Type.String(),
      prop4: Type.String(),
      prop5: Type.String(),
      prop6: Type.String(),
      prop7: Type.String(),
      prop8: Type.String(),
      prop9: Type.String(),
      prop10: Type.String(),
    }),
  ])

  Assert.IsEqual(Value.Cast(A, {
    prop1: '',
    prop2: '',
    prop7: ''
  }), {
      prop1: '',
      prop2: '',
      prop4: '',
      prop5: '',
      prop6: '',
      prop7: '',
      prop8: '',
      prop9: '',
      prop10: '',
  });
})

Is this a design choice?

DemonHa avatar Jun 22 '25 05:06 DemonHa

Sorry, accidently closed this via another PR (re-opened)

sinclairzx81 avatar Jun 22 '25 05:06 sinclairzx81

@sinclairzx81 Just noticed another issue with the rewarding system

Scoring system is rewarding properties matches 1/propertyCount. This would result on incorrect result (see example below)

it.only('should correctly reward, () => {
  const A = Type.Union([
    Type.Object({
      prop1: Type.String(),
      prop2: Type.String(),
      prop3: Type.String(),
    }),
    Type.Object({
      prop1: Type.String(),
      prop2: Type.String(),
      prop4: Type.String(),
      prop5: Type.String(),
      prop6: Type.String(),
      prop7: Type.String(),
      prop8: Type.String(),
      prop9: Type.String(),
      prop10: Type.String(),
    }),
  ])

  Assert.IsEqual(Value.Cast(A, {
    prop1: '',
    prop2: '',
    prop7: ''
  }), {
      prop1: '',
      prop2: '',
      prop4: '',
      prop5: '',
      prop6: '',
      prop7: '',
      prop8: '',
      prop9: '',
      prop10: '',
  });
})

Is this a design choice?

Should I create another issue for this?

DemonHa avatar Jun 22 '25 05:06 DemonHa

@DemonHa

Scoring system is rewarding properties matches 1/propertyCount. This would result on incorrect result (see example below)

I don't think there is a perfectly ideal way to score schematics against values. The current algorithm scores matching discriminator fields as more likely to contribute to a match, but there is still potential for excessive property counts to counter weight discriminator scores.

The logic for scoring is actually really old (it was one of the first bits of logic written for the value submodule). I have been meaning to pull this logic out into a separate module, I wonder if it would be beneficial to pull the scoring logic out from Cast, and into it's own module.

import { MatchVariantFuzzy } from '@sinclair/typebox/value'

// where A, B, C and D are schemas / variants
// where index is 0-N on match, or -1 on no match
const selectedIndex = SelectVariantFuzzy([A, B, C, D], value) 

If you see potential to improve upon the algorithm, it might be easier to implement external to TypeBox first, then test it, then it could be integrated as a new value submodule. Possibly.

./src/value/match/exact.ts // matches union variant exactly (via check)
./src/value/match/fuzzy.ts // matches union variant using fuzzy scoring system

Having these separated out from Cast would let this logic be useful elsewhere.

Thoughts?

sinclairzx81 avatar Jun 22 '25 06:06 sinclairzx81

I don't think there is a perfectly ideal way to score schematics against values. The current algorithm scores matching discriminator fields as more likely to contribute to a match, but there is still potential for excessive property counts to counter weight discriminator scores.

@sinclairzx81 Can we move this discussion to another thread? I will open another issue on the scoring system

DemonHa avatar Jun 22 '25 06:06 DemonHa

Can we move this discussion to another thread? I will open another issue on the scoring system

Sure, but let's keep it as a discussion thread rather than an new issue. Given the nature of algorithm, a general discussion would be a better forum to formalize ways to improve upon the current approach.


As for this PR, it looks good (and thanks for the additional tests). I'll give the updates a more in depth review tonight and look at merging through either tonight or tomorrow.

Thanks again!

sinclairzx81 avatar Jun 22 '25 06:06 sinclairzx81

Sure, but let's keep it as a discussion thread rather than an new issue. Given the nature of algorithm, a general discussion would be a better forum to formalize ways to improve upon the current approach.

I'll try to improve on the current scoring system and open another PR for that. Thank you!

DemonHa avatar Jun 22 '25 06:06 DemonHa

@DemonHa This update looks good. Will merge :)

sinclairzx81 avatar Jun 24 '25 05:06 sinclairzx81

@DemonHa Updates published on 0.34.37

Thanks again for the PR :) S

sinclairzx81 avatar Jun 24 '25 05:06 sinclairzx81