payload beforeValidate hook order and returned value processing broken

Describe the Bug

I discovered a strange field hook issue which breaks causality. There are two fields in the same collection which both have beforeValidate hooks. The first field is a virtual field which generate its value in the hook (side note: fails with normal fields too):

{
        //name (virtual field)
        name: 'name',

        label: {
            en: 'Name',
            de: 'Name'
        },

        type: 'text',

        admin: {
            hidden: true, //hide from admin panel
        },

        hooks: {
            beforeValidate: [
                ({ data }) => `${data.firstName} ${data.lastName}`
            ]
        },

        //Note: cannot be used as title
        virtual: true
    }

In the second field (defined below to this field) there is another hook:

        hooks: {
            beforeValidate: [
                ({ data }) => {
                    console.log('field 2 hook 1');
                    return generateSlug(data.name);
                }
            ]
        },

The problem is now that the order of the hook calls is correct but the second hook never gets the name value being set.

It gets even stranger if the first field defines two hooks in series:

        hooks: {
            beforeValidate: [
                ({ data }) => {
                    console.log('field 1 hook 1');
                    return `${data.firstName} ${data.lastName}`;
                },
                ({ data }) => {
                    console.log('field 1 hook 2');
                    return `${data.firstName} ${data.lastName}`;
                },
            ]
        },

In "field 1 hook 2" I am getting now the data.name value but the call order is:

field 1 hook 1
field 2 hook 1
field 1 hook 2

I had a look at the source code and it heavily uses Array.reduce() to construct the promise chain. Somehow promises from other fields can be queued before.

https://github.com/payloadcms/payload/blob/796df37461e793d9f8a6ffca74e8413d46f99de9/packages/payload/src/fields/hooks/beforeValidate/promise.ts#L274

In think the intension is correct to run the hooks in order of their definition but in practice it gets executed differently. It was very hard to analyze this issue because it looks fine at first sight but return values are being written to the siblingsDoc object at a later time breaking any logic.

I recommend waiting for each promise right away without creating a deeply nested promise chain leading to all kind of issues. Payload replicates this behavior in many files. This could be done simpler and more efficient.

Link to the code that reproduces this issue

https://github.com/cbratschi/payload

Reproduction Steps

See examples above.

Which area(s) are affected? (Select all that apply)

area: core

Environment Info

Binaries:
  Node: 22.9.0
  npm: 10.8.3
  Yarn: 1.22.22
  pnpm: N/A
Relevant Packages:
  payload: 3.6.0
Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 24.1.0: Thu Oct 10 21:05:23 PDT 2024; root:xnu-11215.41.3~2/RELEASE_ARM64_T6031
  Available memory (MB): 65536
  Available CPU cores: 16

Dec 13 '24 17:12 cbratschi

Some more details about the execution order:

field 1 hook 1
- Gets called first
- Returns modified name value
field 2 hook 1
- Get executed next
- Modified name value not yet set, this code runs later
name values gets set
- in reduce() it first called "field 1 hook 1" then waited for its result
- Now it could store name to siblingData (aka data)
field 1 hook 2
- Executed after the hook of a next sibling
- Finally get the doc.name value

Dec 13 '24 17:12 cbratschi

Workaround:

data.name = `${data.firstName} ${data.lastName}`;

In this case name is available in all the hooks.

But this does not really make sense: Payload's examples show the correct usage.

Dec 13 '24 17:12 cbratschi

I further checked the code and here is my solution (I patched the JavaScript code for testing, some TypeScript definitions might be missing therefore):

https://github.com/payloadcms/payload/blob/796df37461e793d9f8a6ffca74e8413d46f99de9/packages/payload/src/fields/hooks/beforeValidate/traverseFields.ts#L48

export const traverseFields = async ({ id, collection, context, data, doc, fields, global, operation, overrideAccess, path, req, schemaPath, siblingData, siblingDoc })=>{
    let fieldIndex = 0;

    for (const field of fields) {
        await promise({
            id,
            collection,
            context,
            data,
            doc,
            field,
            fieldIndex,
            global,
            operation,
            overrideAccess,
            parentPath: path,
            parentSchemaPath: schemaPath,
            req,
            siblingData,
            siblingDoc
        });

        fieldIndex++;
    }
};

Now the fields are processed in order and their hooks are called one after another. Don't know the reason why they were processed in parallel leading to this issue.

This code could be simplified too:

https://github.com/payloadcms/payload/blob/796df37461e793d9f8a6ffca74e8413d46f99de9/packages/payload/src/fields/hooks/beforeValidate/promise.ts#L271

        // Execute hooks
        if (field.hooks?.beforeValidate) {
            for (const hook of field.hooks.beforeValidate) {
                const value = await Promise.resolve(hook({
                    collection,
                    context,
                    data,
                    field,
                    global,
                    operation,
                    originalDoc: doc,
                    overrideAccess,
                    path: fieldPath,
                    previousSiblingDoc: siblingDoc,
                    previousValue: siblingData[field.name],
                    req,
                    schemaPath: fieldSchemaPath,
                    siblingData,
                    value: siblingData[field.name]
                }));

                if (value !== undefined) {
                    siblingData[field.name] = value;
                }
            }
        }

The reduce calls are not necessary. This pattern is used in all other hooks too. Promise.resolve() handles hooks with and without promises.

Dec 15 '24 01:12 cbratschi

Hey @cbratschi — hmm, this is very interesting to me. I thought that the reduce approach that Payload frequently uses is actually going to ensure that the hooks are executed sequentially, but you're saying that this is not the case? I have not yet checked into this myself.

It is indeed our intention to run hooks sequentially, one after another, rather than in parallel. I believe the reason that we used reduce like this is due to an old ESLint rule that prevented for loops. But we have now accepted that those loops are frankly simpler / better in many cases, so I think that rule is gone.

I would be in favor of replacing all of our promise-based reducers for simpler for loops for sure, if this indeed fixes the issue.

@r1tsuu can you investigate this a bit?

Dec 30 '24 19:12 jmikrut

@jmikrut The reduce pattern is fine but outdated as you mentioned (and hard to read and understand). The root cause is collecting all promises of all fields and then using Promise.all(). Here the runtime waits for the slowest promise and executes the micro tasks in their queueing order which is not the order of the field hooks if async hooks are used.

I would double check all Promise.all() calls for such cases and split them into sequential await blocks.

Dec 31 '24 17:12 cbratschi

The root cause is collecting all promises of all fields and then using Promise.all(). Here the runtime waits for the slowest promise and executes the micro tasks in their queueing order which is not the order of the field hooks if async hooks are used

I think that is intentional though - those hooks were never meant to be executed in the order they were defined. The benefit here is improved performance if you have 2+ hooks doing async stuff (e.g. fetch calls or file system operations).

Jan 23 '25 17:01 AlessioGr

Hey @cbratschi — Alessio and I just talked about this in-depth and I think I now understand what you are originally trying to get to.

There are 2 things being discussed here:

1 - Remove our async reduce functions in favor of for loops, which do indeed execute promises one after another. The discussion on this topic is to just clean up the syntax and move to more current syntax.

2 - You want to ensure that field hooks run sequentially just as collection-level hooks do. This is not going to be something that we will easily be able to solve for and would have pretty negative performance implications as large Payload apps can have LOTS of field hooks that are currently executed by design, in parallel to one another. If there are 2 fields at the same level, each having a hook, then those two hook promises will be executed at the same time. This is for performance reasons and would significantly slow down the APIs if we were to run the hooks for each field one after another. So on this one, this is definitely not something that we can change (it'd also be breaking, not to mention slower).

However, on point 2 above, it's possible that we could provide field groups with some type of setting to run field hooks sequentially rather than in parallel, in an opt-in fashion, but I really don't think that this is a good idea.

I'd rather suggest that field hooks which rely on other field hooks should be done later on in the lifecycle of a document. It makes more sense to me that you run hook actions that require multiple fields' data in the collection-level afterRead rather than field-level afterRead.

I will convert this to a discussion so that we can continue to track the demand for some type of opt-in pattern to run field hooks sequentially. This would definitely need to be opt-in though.

Jan 23 '25 19:01 jmikrut