OpenSearch
OpenSearch copied to clipboard
Add SearchRequest/Response proto and schema proto
Description
Generate protobuf for SearchRequest, Response and related schemas from https://github.com/andrross/opensearch-api-specification/blob/main/spec/namespaces/_core.yaml using https://github.com/OpenAPITools/openapi-generator/tree/master.
Related Issues
Resolves #15190
Check List
- [x] DSL coverage report - ~85.7%
- [x] Response/Request Coverage - ~ 85%
Issue:
- proto3 remove the optional/required, we need to figure out how to check if field/property is set.
proposal:
a) use
optional.
with the optional keyword, proto3 generates code that allows you to check whether a field has been set, which can be useful for validating fields’ presence. requires 3.15+ version.
b) use oneOf
c) use customized wrapper
-
Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main. a. Numeric type: number defined in specs but Integer in public docs. we might need to further specify numeric in api-specs / protobuf b. Request body has bunch of new properties but not mentioned in https://opensearch.org/docs/latest/api-reference/search/.
-
We need to have criteria on muti-types as below.
e.g. type:
- boolean
- 'null'
- number
- object
- string
and
type: ['null', number, string]
in this PR, i use ... (open to any suggestions)
message MultiTypeValue{
oneof value{
GeneralNumber number = 1;
string string_value = 2;
bytes bytes_value = 3;
bool bool_value = 4;
}
}
- there are many places defined with
oneOfObject and Array of Object. e.g
knn:
description: Defines the approximate kNN search to run.
oneOf:
- $ref: '../schemas/_common.yaml#/components/schemas/KnnQuery'
- type: array
items:
$ref: '../schemas/_common.yaml#/components/schemas/KnnQuery'
In this PR, all consolidated to
- type: array
items:
$ref: '../schemas/_common.yaml#/components/schemas/XYZ'
openapi-generator Tooling
- Does not recognize characters like ::, ‘@’
- proto number doesn’t start from 1; only enum follow format from 1;
- can not handle multiple property types.
- incomplete handling on AllOf, it will have some missing properties.
- It would generates individual protobuf files for each properties. Manually merge required. Esp, for DSL which has recursive definition, it cannot automatically combine the protobuf files into one.
Overall, openapi-generator can relative accurately generate 70% of probuf but manual validation and adjustment is required which is time-consuming based on this scale of API specs.
TODO: we need to further enhance the tooling to be able to support api-specs to protobuf translation.
:x: Gradle check result for bcf669358ad4b98ab1756af4f6d6924c09edec74: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for e60462e5af61bd8def4157b3fc1828b2f6dd3e36: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for 2bbdb8dab706e2ecbea0abd57042489fc9a07aa9: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
:x: Gradle check result for bd9c7c28261cbfdc420bc105718da9b921444b8f: FAILURE
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?
@amberzsy
Couple of thoughts:
- I think we want to generate these from the complete spec, not a partial namespace one, e.g. https://github.com/opensearch-project/opensearch-api-specification/releases/download/main-latest/opensearch-openapi.yaml (a release).
- It might make sense to add the workflow that generates proto files as a pipeline in https://github.com/opensearch-project/opensearch-api-specification so that we can "release" them the same way as the complete spec. I opened https://github.com/opensearch-project/opensearch-api-specification/issues/532.
- IMO to merge to OpenSearch core we want to have a gradle task that pulls the latest spec and generates these files, or fetches them if we do (2).
WDYT?
2. It might make sense to add the workflow that generates proto files as a pipeline in https://github.com/opensearch-project/opensearch-api-specification so that we can "release" them the same way as the complete spec.
yes, technically, we definitely should have this automated and have it as part of build process. i can maybe have simple version matchAll query and related schema to unblock on grpc poc and remove all the rest of proto def. However, as i mentioned in description. there's some ambiguity for specs -> proto. which i think we need to align on certain criteria. Another question is should we do feature development on top of https://github.com/OpenAPITools/openapi-generator (e.g have RFC and contribute) or we want to have it and develop new/similar in opensearch-build
yes, technically, we definitely should have this automated and have it as part of build process. i can maybe have simple version matchAll query and related schema to unblock on grpc poc and remove all the rest of proto def.
Probably best.
However, as i mentioned in description. there's some ambiguity for specs -> proto. which i think we need to align on certain criteria.
That's why I'd start automating it right away. We likely will need to incrementally fix the spec and the pipeline, add linters to avoid common mistakes in the spec that lead to something we can't convert, etc.
Another question is should we do feature development on top of https://github.com/OpenAPITools/openapi-generator (e.g have RFC and contribute) or we want to have it and develop new/similar in opensearch-build.
I personally don't have a preference. You could write a brand new converter in https://github.com/opensearch-project/opensearch-api-specification, we already parse the spec and do a lot of things to it.
Thanks @amberzsy!
proto3 remove the optional/required, we need to figure out how to check if field/property is set
Your suggestion of using optional makes sense to me.
Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main.
Ultimately the code in this repository that implements the API is the source of truth. If the documentation and the api spec do not agree, then whatever is actually implemented is correct. The goal is to make the api spec match what is actually implemented (this is an ongoing manual process to build it out), then add tests that verify and enforce the actual API matches the spec. The documentation is built out in separate manual process as well so its not surprising that there are subtle differences. The goal is to also be able to generate the API documentation from the specification as well, but we're not there yet.
We need to have criteria on muti-types as below.
A custom message type (MultiTypeValue) wrapping a oneof construct like you showed makes sense to me.
there are many places defined with oneOf Object and Array of Object.
This seems like a quirk of JSON? Like "field": "value" and "field": ["value"] are different JSON but both just represent a single value. I'm guessing we do this so that the brackets are not always required in JSON when sending a single value. It seems like we could flatten this to a single repeated field in protobuf, or else do a oneof wrapping a single value and a repeated value. I don't have a strong opinion here. We should probably pick the simplest approach.
sounds good. i'll update the pr for more grpc poc purpose (instead of proto defs) and we can have separate discussion for the proto/specs etc. I'll provide a doc with list of gaps and plans which will need opinion from you for clarification/alignment by end of tmr.
2. Gaps between https://opensearch.org/docs/latest/api-reference/search/ and https://github.com/andrross/opensearch-api-specification/tree/main.
The doc is incorrect/is missing information. We have https://github.com/opensearch-project/documentation-website/issues/7700 that we plan to take on soon.
a. Numeric type: number defined in specs but Integer in public docs. we might need to further specify numeric in api-specs / protobuf
If the spec needs changes, please contribute/open issues in https://github.com/opensearch-project/opensearch-api-specification.
b. Request body has bunch of new properties but not mentioned in https://opensearch.org/docs/latest/api-reference/search/.
See above.
This seems like a quirk of JSON? Like "field": "value" and "field": ["value"] are different JSON but both just represent a single value. I'm guessing we do this so that the brackets are not always required in JSON when sending a single value. It seems like we could flatten this to a single repeated field in protobuf, or else do a oneof wrapping a single value and a repeated value. I don't have a strong opinion here. We should probably pick the simplest approach.
probably go with single repeated field.
This PR is stalled because it has been open for 30 days with no activity.