river icon indicating copy to clipboard operation
river copied to clipboard

`InsertOpts` option to reconcile metadata where multiple are encountered

Open brandur opened this issue 1 year ago • 3 comments

Here, attempt to resolve #165 by better defining how metadata that may be inserted via InsertOpts on Insert will interact with metadata that may be inserted via a job implementing InsertOpts(). Currently, a job's metadata is always ignored.

The behavior to reconcile two metadata objects has some nuance, and I don't think any one solution will fit everyone's desires. Here, I propose that we let users defined their own preferred behavior by implementing a new InsertOptsMetadataReconcile option:

type InsertOpts struct {
    ...

    // MetadataReconcile defines how, in the presence of multiple InsertOpts
    // like one from the job itself and another one from a Client.Insert param,
    // how metadata will be reconciled between the two. For example, two
    // metadata can be merged together, or one can exclude the other.
    //
    // Defaults to MetadataReconcileExclude.
    MetadataReconcile MetadataReconcile

The possible reconciliation modes:

  • MetadataReconcileExclude: The more specific metadata (from Insert params) completely excludes the less specific metadata (from a job' opts), and the latter is discarded.

  • MetadataReconcileMerge: Carries out a shallow merge. In case of conflict, keys from the more specific metadata win out.

  • MetadataReconcileMergeDeep: Carries out a deep merge, recursing into every object to see if we can use it as a map[string]any. In case of conflict, keys from the more specific metadata win out.

The default setting is MetadataReconcileExclude. This is partly a safety feature because it is possible to create oddly formed metadata that might have trouble merging, and partly a performance feature. Exclude doesn't require trying to parse any of the metadata so that we can try to merge it.

A note on typing: I tried to marshal/unmarshal maps as map[any]any instead of map[string]any so we could support all kinds of wild key types simultaneously, but encoding/json is pretty strict about working with only one type of key at a time, and failed when I tried. We may have to keep it as an invariant that if you want to use one of the merge modes, your metadata must be parseable as map[string]any. This shouldn't be a problem because a struct with json tags on it will always produce a compatible object. I've tried to document this.

Fixes #165.

brandur avatar May 02 '24 14:05 brandur

@bgentry Wanted to get your thoughts on this one — I think the different reconcile options are kind of nice for maximizing flexibility, but there is a chance it's overkill. Thoughts?

brandur avatar May 02 '24 14:05 brandur

Hmm, I think it's partly a bit overkill because I'm not sure the other modes would be needed, but primarily I think the default is wrong and will be a footgun in terms of functionality we have planned. Am I reading correctly that by default if you set metadata in the worker InsertOpts that it will completely overwrite any other specified metadata?

bgentry avatar May 03 '24 14:05 bgentry

So the reason I opted for the exclude option by default is that it certainly has some nice benefits:

  • Merging is inherently much slower because it involves parsing two separate metadata hashes, gluing them together, then serializing them again. Especially where insertion speed is a concern, definitely nice not to have that going on in the background with every operation.
  • Because we have to use map[string]any for the merge, it is possible for users to create incompatible metadata hashes that could produce an error (say using integers as keys for example).

I'm not sure I know the complete plans for future features well enough to see why merging is that important, so I may be missing something there.

I'd still argue strongly that the "exclude" mode needs to be in there as an option because a user should be able to clobber all secondary metadata hashes on an insert if they want to. I could see an argument for dropping the shallow merge option to lean things out, although it shouldn't be particularly harmful to keep in there either.

brandur avatar May 04 '24 07:05 brandur