OpenMetadata
OpenMetadata copied to clipboard
New Entity: API Services and support OpenAPI connector
Is your feature request related to a problem? Please describe. I am having trouble to register metadata for data sources like Interfaces, for example APIs. Let's imagine an extreme (or not so) scenario where company is ONLY using SaaS applications and they only have APIs - how its Data Catalog would look like? What metadata would it host? Current OMD Entities and Connectors don't fully cater such scenario.
In general, I struggle to represent below scenarios:
- For SaaS Applications - as we are using more and more of SaaS application - we lose access to databases and ability to scan their data dictionaries. Those SaaS applications share data via APIs and they represent the data source.
- For Internal Applications -
-
- an application might have one to many databases, schemas and tables. And let's say it gives me 1000 attributes. And, naturally, I would like to inventorize that, because these are my data resources. It represents what data the organization has.
-
- However, that data is only accessible via interfaces (it might be an API (json schema, xml schema), an extract file, a view. And let's say - they cover 500 of attributes. And I would like to document that as well, because this shows my current TRUE data accessibility.
- However, in my data assets for that application I wouldn't not want to show that I have 1500 attributes. So - I do not want to duplicate data assets.
Describe the solution you'd like So I would like to be able to document a) what data assets an organization has and b) which of them it can access and through which end-points. And for SaaS applications, I might only be able to register the end-points. This would require:
- A new Entity Type - Interface. Interface might represent an API, a WebHook, a file extract or maybe a View (is one has no option to scan the database).
- A connector to harvest metadata from OpenAPI schemas or being able to read metadata APIs that some SaaS apps offer. For example - SalesForce has metadata api.
This would allow me to have metadata in Data Catalog represented like:
Describe alternatives you've considered
- loading API information as Tables (via custom CSV connector), where I first extract relevant OpenAPI information and map the fields to Table Schema.
Additional context
- A note on duplications - I will have API endpoints, or wehbooks, where same EventID will be repeated. Or when there is many interfaces for point-to-point integrations.
- Data lineage becomes an important part as well as attributes might get renamed or transformed when prepared for interfaces.
We also are interested in updating JSON schemas apart from any database or physical service. We have our governance mastered in schemas and then use those schemas to code generate thousands of Snowflake views (and Go models and Typescript interfaces and ... many targets of the generator). So we want to have a single source of truth in the schema and link those views back, rather than displaying the metadata only on the derived views.
A similar capability exists in DataHub: https://datahubproject.io/docs/generated/ingestion/sources/json-schema/ and Alation: https://developer.alation.com/dev/docs/virtual-data-source-for-nosql-databases
Alation calls it a "Virtual" Data Source since it's not connected to a physical data source.
Hi, do we have any plans for the API entity? Is it likely to be in the roadmap?
Hello, we also are very interested for this API feature. Something like JSON schemas, or even openAPI ! Why not also AsyncAPI ? Do we have any plans for this API feature ? It seems to be moved, but after removed from last releases. Thanks
@SimonDegeorge its scheduled for 1.5.0 release
Ok thank you, I'm looking forward to see this feature coming !
We are looking at the following way to organize/design on capturing the API Service. An API Service could be single large service that can contain multiple API Endpoints each of these Endpoints can accept a request payload and return a response. For this release we are only going to consider the response payload. If the community feels like documenting request payload is important do let us know.
API Service
API Service is an entity similar to other services in OpenMetadata such as database or dashboard service. It will contain a connection string to read a OpenAPI specification file. The connection also will capture the URL of the service where its hosted.
{
"name": "Sample Server",
"description": "This is a sample server for a pet store.",
"endpointURL": "https://api-server.com/v1",
]
}
Each service can contain one or more api collections. These collections are denoted in OpenAPI specifications as tags to group them by. We will create OpenAPI tags as a collection along with their endpoint url example /api/v1/tables is a collection under which there can be multiple apiEndPoints
{
"name": "Tables",
"description": "`Table` organizes data in rows and columns and is defined in a `Database Schema`.",
"endpointURL": "https://api-server.com/api/v1/tables",
]
}
Each API collection contains multiple APIEndpoints. API Endpoints contains following schema
{
"name": "listTables",
"displayName": "List tables",
"description": "Get a list of tables, optionally filtered by `database` it belongs to. Use `fields` parameter to get only necessary fields. Use cursor-based pagination to limit the number entries in the list using `limit` and `before` or `after` query params..",
"endpointURL": "https://api-server.com/api/v1/tables",
"requestMethod" : "GET",
"requestSchema": {},
"responseSchema": {}
]
}
@DovileKr @SimonDegeorge @jdimeo here is the schema definitions https://github.com/open-metadata/OpenMetadata/pull/16783 . Let me know if this schema captures what you are looking for
Perfect for me @harshach thanks a lot ! Do you think about a way to have the lineage between an API endpoint & a table ? For exemple with an airbyte or airflow pipeline ?
@SimonDegeorge yes you'll have API end point -> Kafka topic or directly to a table whichever the use-case one see fit.