OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Add SearchRequest/Response proto and schema proto

Open amberzsy opened this issue 1 year ago • 10 comments

Description

Generate protobuf for SearchRequest, Response and related schemas from https://github.com/andrross/opensearch-api-specification/blob/main/spec/namespaces/_core.yaml using https://github.com/OpenAPITools/openapi-generator/tree/master.

Related Issues

Resolves #15190

Check List

  • [x] DSL coverage report - ~85.7% image
  • [x] Response/Request Coverage - ~ 85% image

Issue:

  1. proto3 remove the optional/required, we need to figure out how to check if field/property is set. proposal: a) use optional.

with the optional keyword, proto3 generates code that allows you to check whether a field has been set, which can be useful for validating fields’ presence. requires 3.15+ version.

b) use oneOf c) use customized wrapper

  1. Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main. a. Numeric type: number defined in specs but Integer in public docs. we might need to further specify numeric in api-specs / protobuf b. Request body has bunch of new properties but not mentioned in https://opensearch.org/docs/latest/api-reference/search/.

  2. We need to have criteria on muti-types as below.

e.g.  type: 
        - boolean
        - 'null'
        - number
        - object
        - string
and 
type: ['null', number, string]

in this PR, i use ... (open to any suggestions)

message MultiTypeValue{
  oneof value{
    GeneralNumber number = 1;
    string string_value = 2;
    bytes bytes_value = 3;
    bool bool_value = 4;
  }
}
  1. there are many places defined with oneOf Object and Array of Object. e.g
          knn:
                description: Defines the approximate kNN search to run.
                oneOf:
                  - $ref: '../schemas/_common.yaml#/components/schemas/KnnQuery'
                  - type: array
                    items:
                      $ref: '../schemas/_common.yaml#/components/schemas/KnnQuery'
  

In this PR, all consolidated to

- type: array
                    items:
                      $ref: '../schemas/_common.yaml#/components/schemas/XYZ'

openapi-generator Tooling

  1. Does not recognize characters like ::, ‘@’
  2. proto number doesn’t start from 1; only enum follow format from 1;
  3. can not handle multiple property types.
  4. incomplete handling on AllOf, it will have some missing properties.
  5. It would generates individual protobuf files for each properties. Manually merge required. Esp, for DSL which has recursive definition, it cannot automatically combine the protobuf files into one.

Overall, openapi-generator can relative accurately generate 70% of probuf but manual validation and adjustment is required which is time-consuming based on this scale of API specs.

TODO: we need to further enhance the tooling to be able to support api-specs to protobuf translation.

amberzsy avatar Aug 26 '24 06:08 amberzsy

:x: Gradle check result for bcf669358ad4b98ab1756af4f6d6924c09edec74: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 26 '24 06:08 github-actions[bot]

:x: Gradle check result for e60462e5af61bd8def4157b3fc1828b2f6dd3e36: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 26 '24 06:08 github-actions[bot]

:x: Gradle check result for 2bbdb8dab706e2ecbea0abd57042489fc9a07aa9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 26 '24 08:08 github-actions[bot]

:x: Gradle check result for bd9c7c28261cbfdc420bc105718da9b921444b8f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Aug 26 '24 08:08 github-actions[bot]

@amberzsy

Couple of thoughts:

  1. I think we want to generate these from the complete spec, not a partial namespace one, e.g. https://github.com/opensearch-project/opensearch-api-specification/releases/download/main-latest/opensearch-openapi.yaml (a release).
  2. It might make sense to add the workflow that generates proto files as a pipeline in https://github.com/opensearch-project/opensearch-api-specification so that we can "release" them the same way as the complete spec. I opened https://github.com/opensearch-project/opensearch-api-specification/issues/532.
  3. IMO to merge to OpenSearch core we want to have a gradle task that pulls the latest spec and generates these files, or fetches them if we do (2).

WDYT?

dblock avatar Aug 26 '24 19:08 dblock

2. It might make sense to add the workflow that generates proto files as a pipeline in https://github.com/opensearch-project/opensearch-api-specification so that we can "release" them the same way as the complete spec.

yes, technically, we definitely should have this automated and have it as part of build process. i can maybe have simple version matchAll query and related schema to unblock on grpc poc and remove all the rest of proto def. However, as i mentioned in description. there's some ambiguity for specs -> proto. which i think we need to align on certain criteria. Another question is should we do feature development on top of https://github.com/OpenAPITools/openapi-generator (e.g have RFC and contribute) or we want to have it and develop new/similar in opensearch-build

amberzsy avatar Aug 26 '24 20:08 amberzsy

yes, technically, we definitely should have this automated and have it as part of build process. i can maybe have simple version matchAll query and related schema to unblock on grpc poc and remove all the rest of proto def.

Probably best.

However, as i mentioned in description. there's some ambiguity for specs -> proto. which i think we need to align on certain criteria.

That's why I'd start automating it right away. We likely will need to incrementally fix the spec and the pipeline, add linters to avoid common mistakes in the spec that lead to something we can't convert, etc.

Another question is should we do feature development on top of https://github.com/OpenAPITools/openapi-generator (e.g have RFC and contribute) or we want to have it and develop new/similar in opensearch-build.

I personally don't have a preference. You could write a brand new converter in https://github.com/opensearch-project/opensearch-api-specification, we already parse the spec and do a lot of things to it.

dblock avatar Aug 26 '24 20:08 dblock

Thanks @amberzsy!

proto3 remove the optional/required, we need to figure out how to check if field/property is set

Your suggestion of using optional makes sense to me.

Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main.

Ultimately the code in this repository that implements the API is the source of truth. If the documentation and the api spec do not agree, then whatever is actually implemented is correct. The goal is to make the api spec match what is actually implemented (this is an ongoing manual process to build it out), then add tests that verify and enforce the actual API matches the spec. The documentation is built out in separate manual process as well so its not surprising that there are subtle differences. The goal is to also be able to generate the API documentation from the specification as well, but we're not there yet.

We need to have criteria on muti-types as below.

A custom message type (MultiTypeValue) wrapping a oneof construct like you showed makes sense to me.

there are many places defined with oneOf Object and Array of Object.

This seems like a quirk of JSON? Like "field": "value" and "field": ["value"] are different JSON but both just represent a single value. I'm guessing we do this so that the brackets are not always required in JSON when sending a single value. It seems like we could flatten this to a single repeated field in protobuf, or else do a oneof wrapping a single value and a repeated value. I don't have a strong opinion here. We should probably pick the simplest approach.

andrross avatar Aug 26 '24 22:08 andrross

sounds good. i'll update the pr for more grpc poc purpose (instead of proto defs) and we can have separate discussion for the proto/specs etc. I'll provide a doc with list of gaps and plans which will need opinion from you for clarification/alignment by end of tmr.

amberzsy avatar Aug 26 '24 22:08 amberzsy

2. Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main.

The doc is incorrect/is missing information. We have https://github.com/opensearch-project/documentation-website/issues/7700 that we plan to take on soon.

a. Numeric type: number defined in specs but Integer in public docs. we might need to further specify numeric in api-specs / protobuf

If the spec needs changes, please contribute/open issues in https://github.com/opensearch-project/opensearch-api-specification.

b. Request body has bunch of new properties but not mentioned in https://opensearch.org/docs/latest/api-reference/search/.

See above.

dblock avatar Aug 27 '24 10:08 dblock

This seems like a quirk of JSON? Like "field": "value" and "field": ["value"] are different JSON but both just represent a single value. I'm guessing we do this so that the brackets are not always required in JSON when sending a single value. It seems like we could flatten this to a single repeated field in protobuf, or else do a oneof wrapping a single value and a repeated value. I don't have a strong opinion here. We should probably pick the simplest approach.

probably go with single repeated field.

amberzsy avatar Aug 29 '24 19:08 amberzsy

This PR is stalled because it has been open for 30 days with no activity.