sdk icon indicating copy to clipboard operation
sdk copied to clipboard

[DO NOT MERGE] Add validateValueType function to ensure data type integrity

Open i5o opened this issue 2 years ago • 7 comments

Added the validateValueType function, significantly enhancing the robustness of our data handling in the createIntegrationEntity process.

The function is designed to enforce strict type checking, ensuring that all data conforms to our predefined SUPPORTED_TYPES.

This only applies for createIntegrationEntity, we already have types for assign but IMO it's never too late to validate anyway

i5o avatar Dec 19 '23 14:12 i5o

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

jit-ci[bot] avatar Dec 19 '23 14:12 jit-ci[bot]

It might be worth adding a couple scale tests to this to ensure that running on tens or hundreds of thousands of values doesn't significantly impact performance. You don't have to include those on every ci run, but they're good to have to validate. We did that with normalization in persister and caught a couple of important bugs that would have caused significant slow downs.

xdumaine avatar Dec 19 '23 14:12 xdumaine

@xdumaine I've added tests for 500k of items, they run properly on my M1 Pro laptop.

 PASS  packages/integration-sdk-core/src/data/__tests__/validateValueType.test.ts
  ● Console

    console.log
      Execution time for string dataset: 62.04570806026459 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time for number dataset: 66.66854095458984 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time for boolean dataset: 66.96124994754791 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time for array dataset: 278.7055000066757 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time for undefined dataset: 68.97012507915497 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:80:15)

    console.log
      Execution time until error: 0.13349997997283936 ms

      at Object.log (src/data/__tests__/validateValueType.test.ts:118:15)
      ````

i5o avatar Dec 19 '23 14:12 i5o

Added [DO NOT MERGE] to review after holidays

i5o avatar Dec 19 '23 16:12 i5o

I think this approach is both valid and a good incremental step forward. It solves a problem we have. I do think that it may be worth considering if using a JSON schema validator on the entities and properties would not provide a more battle-tested form of data validation. We already make use of some JSON Schema validation during development and with better integration I think it could be a powerful safeguard in our platform.

Additionally, JSON schemas would help with search and entity property documentation, a current and longstanding problem.

zemberdotnet avatar Jan 02 '24 16:01 zemberdotnet

Can we simplify this and avoid recursive calls with something like

validate(entity: Entity): void {
  for(const [key, value] of Object.entries(entity)) {
    const valid = Array.isArray(value) ? value.every(p => SUPPORTED_TYPES.includes(typeof p)) : SUPPORTED_TYPES.includes(typeof value);
    valid || throw new IntegrationError({
        code: 'UNSUPPORTED_TYPE',
        message: `Unsupported type found at "${key}": Nested arrays are not supported.`,
      });
  }
}

Nick-NCSU avatar Jan 02 '24 19:01 Nick-NCSU

@Nick-NCSU tried adding your code into my existing codebase but couldn't succeed, feel free to push to this branch!

i5o avatar Jan 02 '24 23:01 i5o