amundsen
amundsen copied to clipboard
Add Catalog to the Advanced Search
Right now we can search by Source, Table, Column, etc, but there is not a UI element for the Catalog. If I write a URL to filter by Catalog, it all works as expected.
Currently, the BigQuery loader puts the Project into the "Catalog" property. The "Schema" property contains the dataset. The Source is always BigQuery (which is helpful, since we have other kinds of databases too)
Service or Ingestion ETL
This seems like a Frontend only change.
Example Screenshots (if appropriate):
If there was Catalog in addition to Source, that would help us out a lot.
Context
We could see how many tables have an "email" column in a certain BigQuery Project. (Certain projects are not allowed to have certain kinds of data)
Currently we put BQ project ID(is it the catalog you referred to?) in the cluster field(https://github.com/lyft/amundsendatabuilder/blob/master/databuilder/extractor/bigquery_metadata_extractor.py#L77) . If I understand, we want to also use advanced search / filter for a given project ID?
Yeah, that is what i mean.
I think I can probably update the config here? https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/static/js/config/config-default.ts#L92-L122
will let @ttannis to comment. The search backend should already had the support for cluster(project in BQ) in https://github.com/lyft/amundsensearchlibrary/blob/master/search_service/proxy/elasticsearch.py#L37.
@lukelowery, Looking at https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/static/js/config/config-default.ts#L92-L122 the link you post, I think that only works for the case if you have different databases(bq, hive, presto) etc. One short term workaround would be put project id in the database field in your env. But this needs to test it out and verify...
What's the context of "catalog"? I'm not familiar.
But the general idea is that folks should be able to customize their filters in their custom application configuration. The items we provide in the default configuration aren't expected to be updated for individual use cases, and are kept in sync with the categories supported by the default ES implementation.
@ttannis ,Bigquery reuses the cluster field for catalog which is to represent BQ project per my understanding.
For hive, we have the same field for cluster currently, but for BQ,they could have different field for different tables depending on which catalog they belong to.
Is it implemented? We are having the same usecase, where we are using snowflake and BigQuery both and we wanted to enable the search by Database Name (In Snowflake) & Project Name (in BQ) which is tied to Cluster field.