tiled icon indicating copy to clipboard operation
tiled copied to clipboard

Add support for patching, merging, replacing metadata

Open hyperrealist opened this issue 11 months ago • 13 comments

  • patch_metadata uses a http patch request with application/json-patch+json content type
  • merge_metadata can be triggered with a http patch request and application/merge-patch+json
  • update_metadata uses a similar approach to merge_metadata, but constructs a json-patch on the client-side
  • add jsonpatch and json-merge-patch to requirements

hyperrealist avatar Mar 11 '24 18:03 hyperrealist

@danielballan, @Kezzsim this PR is ready for another review. Changes since your last review are minimal, though it took me an obscene amount of time to figure this out.

hyperrealist avatar Mar 21 '24 10:03 hyperrealist

I pushed a couple of commits addressing suggestions from @danielballan and @genematx.

Also added an option in update_metadata for retrieving a JSON patch without applying it. May be useful for setting up a batch update via patch_metadata (fast).

hyperrealist avatar Mar 22 '24 07:03 hyperrealist

I’ve asked @Kezzsim to test drive this locally a bit before we merge. Merge away when you are satisfied, Kari!

danielballan avatar Mar 22 '24 21:03 danielballan

Rebased and force-pushed to resolve merge conflicts with #695

danielballan avatar Mar 26 '24 17:03 danielballan

Tests failed due to an incompatibility with pydantic 2.x, introduced by #695. I have pushed a commit with a fix.

danielballan avatar Mar 26 '24 18:03 danielballan

This looks good, however I'm noticing a break from convention when it comes to how HTTP is implemented. Pulling up swagger docs on http://127.0.0.1:8000/docs reveals that application/json is the only mimetype in the fastAPI dropdown. jsonpatchproblem1 Technically this is correct, as the body in @hyperrealist 's spec is a json object which contains valid json patches, but the overall request body itself is not a json patch.

Example of current request body [dict]:

{
 "patch": [
   {
     "op": "replace",
     "path": "/test",
     "value": "testing"
   }
 ],
 "specs": null
}

Proposed json patch implementation of body [list]: (conforming to RFC6902)

[
   {
     "op": "replace",
     "path": "metadata/test",
     "value": "testing"
   }
 ]

Fixing this to have it be within spec would require one of two potential changes:

  1. Split the endpoint into PATCH /api/v1/metadata/{path} and PATCH /api/v1/specs/{path} which both take valid json-patch objects that alter their respective properties at the root path.
  2. (As seen in the example above) change the body to be a valid json-patch but make the RFC6902 path roots be metadata and specs respectively.

@danielballan mentioned pulling in @dylanmcreynolds since he has a good eye for making similar API design decisions that are both conforming to spec and aesthetically pleasing. I'd like to see what his suggestion is.

Kezzsim avatar Mar 26 '24 20:03 Kezzsim

Thanks @Kezzsim, this is something that had been bothering me in the background at times. I appreciate your attention to detail!

  1. Split the endpoint into PATCH /api/v1/metadata/{path} and PATCH /api/v1/specs/{path} which both take valid json-patch objects that alter their respective properties at the root path.

  2. (As seen in the example above) change the body to be a valid json-patch but make the RFC6902 path roots be metadata and specs respectively.

  1. If it is typical to have to update both metadata and specs together this would necessitate extra http requests. If that is not the case I think this is a very much viable solution.
  2. This is a creative way to conform to the http spec but I don't like how arguments are mixed together, especially since how they are mixed together would change depending on the mimetype (we support both application/json-patch+json and application/merge-patch+json. See here).
  3. A third solution might be to let the payload be of application/json mimetype and split the endpoint into something like PATCH /api/v1/patch_metadata/{path} and PATCH /api/v1/merge_metadata/{path}, each requiring a valid json with an embedded json patch or a merge patch respectively. But I think this is achieving little at the expense of polluting the API and adding to the complexity.
  4. A fourth solution is to enforce a sub-mimetype at the content level:
{
 "type": "application/json-patch+json",  # could default to this. may also support aliases  "json-patch"
 "metadata": [
   {
     "op": "replace",
     "path": "/test",
     "value": "testing"
   }
 ],
 "specs": [],
}

and

{
 "type": "application/merge-patch+json",  # or "merge-patch" / "merge"
 "metadata": {"test": "testing"},
 "specs": [],
}

All these solutions would conform to http spec, but I am leaning toward solutions 1 or 4 for their conceptual simplicity and ease of implementation. Anyway, I am completely open to other ideas and design considerations.

hyperrealist avatar Mar 27 '24 07:03 hyperrealist

Yes, good catch @Kezzsim! And thanks for presenting some additional options @hyperrealist.

Some additional design considerations:

Option (1) would break symmetry between GET and PATCH, as GET /metadata/{path} currently returns all "metadata" about a node, broadly defined, including structure family, structure, specs, data sources, and metadata. Making separate routes for PATCH seems asymmetrical in this sense, in addition to introducing overhead from separate requests for some use cases.

I can foresee adding support for patching data sources, alongside metadata and specs, which would make the downsides with (2) and (3) even more salient.

I can see no problems with (4). There was something appealing about specifying the patch format in the content-type header, but there is also something clean about making the request plain JSON (everybody knows JSON!) with the patch format specified in the body. It seems to keep our options open.

If we go that way, it is worth considering options for what to name the key. Is it content-type, format, minetype, type?

danielballan avatar Mar 27 '24 11:03 danielballan

I very much intentionally wanted to avoid calling it content-type. I wouldn't envy whoever is documenting content-type the http header and content-type the subtype for this endpoint. mimetype sounds too restrictive to me as I would hesitate to support aliases under that name. Between format and type I like type because it hints at this being a mimetype without being overly restrictive. If adding a bit to the boilerplate is not an issue I would go with patch-type, metadata-patch, and specs-patch.

hyperrealist avatar Mar 27 '24 13:03 hyperrealist

Seeing :+1:s from @Kezzsim and myself, I think going ahead with (4) with name type makes sense.

danielballan avatar Mar 28 '24 13:03 danielballan

I very much intentionally wanted to avoid calling it content-type. I wouldn't envy whoever is documenting content-type the http header and content-type the subtype for this endpoint. mimetype sounds too restrictive to me as I would hesitate to support aliases under that name.

I'm not sure I understand the hesitance about calling it content-type. There's precedence here. For example, it normally appears in both the header and the body of multi-part form posts: https://swagger.io/docs/specification/describing-request-body/multipart-requests/

dylanmcreynolds avatar Apr 11 '24 15:04 dylanmcreynolds

CI errors for Python 3.11 are coming from dask's incompatibility with 3.11.9 (fixed in 2024.4.1). Github CI uses 3.11.9, so it is broken at the moment.

I pinned python<3.11.9 for now [on second thought, let me actually remove the pin so that you can see the errors], but I am not sure if I should attempt to resolve this issue in this PR. We should probably update to dask 2024.4.1.

With that note, I think this ready for another review, @danielballan @Kezzsim @genematx. I did not implement patch for specs, but just for metadata. If you think patching specs makes sense I can add that.

hyperrealist avatar Apr 17 '24 06:04 hyperrealist

FYI, @hyperrealist, you may have seen this go by already but in case not, the CI failures on Python 3.11 are not the fault of this PR and can be ignored. See #715.

danielballan avatar Apr 18 '24 18:04 danielballan

Rebased to pull in latest CHANGELOG and place this change entry on "Next" release.

danielballan avatar Apr 23 '24 18:04 danielballan

Revisiting @hyperrealist's comment proposing this API:

{
 "type": "application/json-patch+json",  # could default to this. may also support aliases  "json-patch"
 "metadata": [
   {
     "op": "replace",
     "path": "/test",
     "value": "testing"
   }
 ],
 "specs": [],
}

and

{
 "type": "application/merge-patch+json",  # or "merge-patch" / "merge"
 "metadata": {"test": "testing"},
 "specs": [],
}

I now realize that this was a little ambiguous about what's happening with specs. As currently implemented in this PR, specs is handled the same as in PUT. The content of specs in the request replaces the current specs.

class PatchMetadataRequest(HyphenizedBaseModel):
    content_type: str

    # These fields are optional because None means "no changes; do not update".
    patch: Optional[Union[List[JSONPatchSpec], Dict]]  # Dict for merge-patch
    specs: Optional[Specs]

I think it would be more useful (and more consistent) to make specs also a patch.

class PatchMetadataRequest(HyphenizedBaseModel):
    content_type: str

    # These fields are optional because None means "no changes; do not update".
    metadata: Optional[Union[List[JSONPatchSpec], Dict]]  # Dict for merge-patch
    specs: Optional[Union[List[JSONPatchSpec], Dict]]  # Dict for merge-patch

Notes:

  • The user has to pick one patch content-type for the whole request; they cannot be mixed. This seems like a good simplifying constraint.
  • For specs, the performance advantage of "patch" vs "replace" is low. The list of specs is not expected to become long.
  • But in a future PR I foresee adding data_sources here, and at that point I think we'll be glad for a consistent way to amend the metadata, specs , and data_sources, each with their own patch.

Existing unit tests cover update_metadata, but I think it's worthwhile to add a couple tests covering the new methods.

danielballan avatar Apr 24 '24 13:04 danielballan

I played with this locally. It works as advertised. I really like the ease of use of update_metadata...no more manually making deep copies in user code!

I took the liberty of rebasing on main and then renaming md_patch to metadata_patch. (I appreciate the case for brevity, but we don't abbreviate it anywhere else in the codebase, I suspect it will be more often read that typed, so I am privileging clarity.)

Will merge when CI passes.

danielballan avatar Jun 04 '24 15:06 danielballan