zod icon indicating copy to clipboard operation
zod copied to clipboard

Make zod faster

Open PredokMiF opened this issue 2 years ago • 12 comments

I found repo with validators benchmarks. Zod in it not so good as I expected.

I wrote simple test for my model and custom validation function, which fully corresponds to the Zod scheme.

When running, I saw a 15x (1500%) execution speed difference.

image

My performance test code:

const z = require('zod')

const DATA = {
    id: 1,
    yo: 1,
    email: '[email protected]',
    username: 'admin',
    score: 0,
    roles: ['owner'],
    userProfiles: [
        { profileId: 1, profileName: 'Yo!1' },
        { profileId: 2, profileName: 'Yo!2' },
    ],
}
const TRY_COUNT = 100000

function validateZod(data, tryCount) {
    const userSchema = z.object({
        id: z.number(),
        email: z.string(),
        username: z.string(),
        score: z.number(),
        roles: z.array(
            z.enum(['admin', 'manager', 'owner', 'customer'])
        ),
        userProfiles: z.array(
            z.object({ profileId: z.number(), profileName: z.string() })
        ),
    })

    for(let i = 0; i < tryCount; i += 1) {
        userSchema.parse(data)
    }
}

function validateCustom(data, tryCount) {
    function parse(data) {
        if (!data || typeof data !== 'object') {
            throw new Error('data is not object')
        }

        const out = {}

        if (typeof data.id !== 'number') {
            throw new Error('id is not number')
        }
        out.id = data.id

        if (typeof data.email !== 'string') {
            throw new Error('email is not string')
        }
        out.email = data.email

        if (typeof data.username !== 'string') {
            throw new Error('username is not string')
        }
        out.username = data.username

        if (typeof data.score !== 'number') {
            throw new Error('score is not number')
        }
        out.score = data.score

        if (!Array.isArray(data.roles)) {
            throw new Error('roles is not array')
        }
        out.roles = data.roles.map((data, i) => {
            if (!data || typeof data !== 'string') {
                throw new Error(`roles[${i}] is not string`)
            }

            if (!['admin', 'manager', 'owner', 'customer'].includes(data)) {
                throw new Error(`roles[${i}] is not string`)
            }

            return data
        })

        if (!Array.isArray(data.userProfiles)) {
            throw new Error('userProfiles is not array')
        }
        out.userProfiles = data.userProfiles.map((data, i) => {
            if (!data || typeof data !== 'object') {
                throw new Error(`userProfiles[${i}] is not object`)
            }

            const out = {}

            if (typeof data.profileId !== 'number') {
                throw new Error(`userProfiles[${i}].profileId is not number`)
            }
            out.profileId = data.profileId

            if (typeof data.profileName !== 'string') {
                throw new Error(`userProfiles[${i}].profileName is not string`)
            }
            out.profileName = data.profileName

            return out
        })

        return out
    }

    for(let i = 0; i < tryCount; i += 1) {
        parse(data)
    }
}

// Zod test

let startTs = Date.now()
let ticksStart = process.hrtime.bigint()

validateZod(DATA, TRY_COUNT)

let ticksEnd = process.hrtime.bigint()
let endTs = Date.now()

console.log(`ZOD: Time sec ${(endTs - startTs) / 1000}, ticks: ${Math.round(Number(ticksEnd - ticksStart) / 1000)}`)

// Custom validation test

startTs = Date.now()
ticksStart = process.hrtime.bigint()

validateCustom(DATA, TRY_COUNT)

ticksEnd = process.hrtime.bigint()
endTs = Date.now()

console.log(`CUSTOM: Time sec ${(endTs - startTs) / 1000}, ticks: ${Math.round(Number(ticksEnd - ticksStart) / 1000)}`)

I know the overhead is unavoidable, but maybe you can optimize the code by precompiling the Zod-schema and make 15x gap smaller. My suggestion is to compile the Zod-schema with a new Function() (at least for simple cases) and use it when calling validation.

function compileSchema(zodSchemaDefinition) {
    let code = ''

    if (zodSchemaDefinition.nullable) {
        code += `if (value === null) return null;\n`
    }

    if (zodSchemaDefinition.optional) {
        code += `if (value === undefined) return undefined;\n`
    }

    if (zodSchemaDefinition.type !== 'string') {
        // MVP compiler: only for strings at the moment
        return null
    }

    code += `if (typeof value !== 'string') throw new Error('value is not a string');\n`

    if (zodSchemaDefinition.min) {
        code += `if (value.length < ${zodSchemaDefinition.min}) throw new Error('Minimum value length is ${zodSchemaDefinition.min}');\n`
    }

    code += 'return value;'

    return new Function('value', code)
}

// Schema must get from Zod schema metadata
const validator = compileSchema({
    nullable: true,
    type: 'string',
    min: 2,
})

console.log('Generated validator function:')
console.log(validator.toString())
console.log()
console.log('Value "qwe" validation: ', JSON.stringify(validator('qwe')))
console.log()
console.log('Next validation of number 1 must throw error')
console.log(validator(1))

image

PredokMiF avatar Jul 28 '23 13:07 PredokMiF

Did I write something wrong or just have no idea how to implement this?

PredokMiF avatar Nov 05 '23 23:11 PredokMiF

Good idea, I'm sure that if you submit a pull request it'll be reviewed carefully and as long as all the tests pass could be merged promptly. I love zod, but I have been forced to use inferior alternatives which do less interesting validations but are faster so I can't wait to see what you do 👍

JosephHalter avatar Dec 04 '23 18:12 JosephHalter

I'm working on a solution that uses a similar process. The difficulty does not lie in algorithmic logic but rather in the cleanliness of the solution.

In one afternoon I have already created a "build" of function string, number and Object.

This is the result:

const zodSchema = zod.object({
    firstname: zod.string().trim().toLowerCase().max(15).min(2),
    lastname: zod.string().trim().toUpperCase().max(15).min(2),
    age: zod.coerce.number().min(16),
    email: zod.string().email(),
    addresse: zod.object({
        postCode: zod.coerce.string().regex(/[0-9]+/),
        city: zod.string().max(50),
        number: zod.number()
    }).strict()
}).strict()

const zodAcceleratorSchema = ZodAccelerator.make(zodSchema);

let startTs = Date.now()
let ticksStart = process.hrtime.bigint()

for(let i = 0; i < 10000; i += 1) {
    zodAcceleratorSchema({
        firstname: "  Mike ",
        lastname: " gnogno  ",
        age: 21,
        email: "[email protected]",
        addresse: {
            postCode: 22778,
            city: "Paris",
            number: 67
        }
    })
}

let ticksEnd = process.hrtime.bigint()
let endTs = Date.now()

console.log(`ZodAccelerator: Time sec ${(endTs - startTs) / 1000}, ticks: ${Math.round(Number(ticksEnd - ticksStart) / 1000)}`)

startTs = Date.now()
ticksStart = process.hrtime.bigint()


for(let i = 0; i < 10000; i += 1) {
    zodSchema.parse({
        firstname: "  Mike ",
        lastname: " gnogno  ",
        age: 21,
        email: "[email protected]",
        addresse: {
            postCode: 22778,
            city: "Paris",
            number: 67
        }
    })
}

ticksEnd = process.hrtime.bigint()
endTs = Date.now()

console.log(`ZOD: Time sec ${(endTs - startTs) / 1000}, ticks: ${Math.round(Number(ticksEnd - ticksStart) / 1000)}`)
Capture d’écran 2024-02-23 à 11 50 23

PS : by keeping only the type operations, I manage to be 3x faster over 10000 operations and 6.5x faster over 100000 operations

mathcovax avatar Feb 23 '24 10:02 mathcovax

Hi @colinhacks, I think I found a good recipe for zod ! The schema below could be accelerated and not just a little.

const zodSchema = zod.object({
    firstname: zod.string().trim(),
    lastname: zod.string().nullable(),
    age: zod.coerce.number(),
    email: zod.string(),
    gender: zod.enum(["boy", "girl"]),
    connected: zod.boolean(),
    createdAt: zod.coerce.date(),
    addresse: zod.object({
        postCode: zod.coerce.string().transform((val) => val + "turbodab"),
        city: zod.string(),
        number: zod.number()
    }),
    test: zod.tuple([zod.string().trim(), zod.number()]).rest(zod.string().default("lolo")),
    tutu: zod.union([ // responsible for the slowdown
        zod.literal("123"),
        zod.literal("456"),
        zod.object({
            test: zod.string()
        }),
    ]).optional().catch("123")
}).array()
Capture d’écran 2024-02-26 à 18 14 05

parsed data :

const data = Array.from({length: 10}).fill({
    firstname: "  Mike ",
    lastname: null,
    age: 21,
    email: "[email protected]",
    gender: "girl",
    connected: true,
    createdAt: "2024-09-13",
    addresse: {
        postCode: 22778,
        city: "Paris",
        number: 67
    },
    test: ["temp  ", 1, "cheese", "ok", undefined],
    tutu: "litote",
})

test :

let startTs = Date.now()
let ticksStart = process.hrtime.bigint()

for(let i = 0; i < 100000; i += 1) {
    zodSchema.parse(data)
}

let ticksEnd = process.hrtime.bigint()
let endTs = Date.now()

console.log(`Zod: Time sec ${(endTs - startTs) / 1000}, ticks: ${Math.round(Number(ticksEnd - ticksStart) / 1000)}`)

startTs = Date.now()
ticksStart = process.hrtime.bigint()

for(let i = 0; i < 100000; i += 1) {
    zodAccelerator.parse(data)
}

ticksEnd = process.hrtime.bigint()
endTs = Date.now()

console.log(`ZodAccelerator: Time sec ${(endTs - startTs) / 1000}, ticks: ${Math.round(Number(ticksEnd - ticksStart) / 1000)}`)

mathcovax avatar Feb 26 '24 17:02 mathcovax

https://github.com/duplojs/duplojs-zod-accelerator

mathcovax avatar Mar 08 '24 14:03 mathcovax

I’m all for making Zod faster, but I’d like to still be able to run it on the edge. A lot of the reason why some of those other validators are faster is because they use eval or new Function.

@mathcovax I couldn’t find any documentation explaining the approach you used for accelerating, but it seems it’s using eval?

vbudovski avatar Apr 03 '24 02:04 vbudovski

I’m all for making Zod faster, but I’d like to still be able to run it on the edge. A lot of the reason why some of those other validators are faster is because they use eval or new Function.

@mathcovax I couldn’t find any documentation explaining the approach you used for accelerating, but it seems it’s using eval?

hello 🙂, actually I was a bit stingy in terms of explanation ^^'. eval or new Function there is no difference, the strategy remains the same. For faster execution, a custom function is built in a string which will then interpret pars the functions eval or new Function. It’s not complex has realized, the challenges are architectural. It’s not easy to make code that manipulates string maintainable.

mathcovax avatar Apr 03 '24 08:04 mathcovax

Thanks for confirming. Unfortunately a lot of Content Security Policies will not allow dynamic code execution, so if this were to become the default approach used by Zod, it would make it impossible to use it in those environments. See Next.js docs for example. Fine if it's opt-in or opt-out. Another approach I've seen used (by Ajv I believe?) is making it possible to compile the schemas with a CLI utility so that there is no dynamic code execution at runtime.

vbudovski avatar Apr 03 '24 09:04 vbudovski

I reassure you, use zodAccelerator does not prevent use zod normal. zodAccelerator just create a function will start from a schema. the strategy you are talking about remains the same, it generates a function in a string but instead of using "eval" it creates a file containing the function then it asks for import. It makes me think that I have to set up the possibliter of just generated a chain skipping the step of the "eval". Thanks for the info 😉

mathcovax avatar Apr 03 '24 12:04 mathcovax

any workaround for zodunion and other slowness from https://github.com/colinhacks/zod/issues/2613#issuecomment-1964700733 ?

As of now I am getting 500Kb data via WebSockets and it takes 608ms to validate it, which is too much I think

Lonli-Lokli avatar Jun 25 '24 11:06 Lonli-Lokli

une solution de contournement pour zodunion et autres lenteurs du #2613 (commentaire) ?

À l'heure actuelle, je reçois 500 Ko de données via WebSockets et il faut 608 ms pour les valider, ce qui est trop, je pense.

hi, this issue has been solved. Zod is almost x100 faster on the operation fat union with zodAccelerator. watch the bench mark in the read me

mathcovax avatar Jun 27 '24 10:06 mathcovax

Yes, but your solution does not work in strict csp environments

Lonli-Lokli avatar Jun 27 '24 10:06 Lonli-Lokli