amundsen Add Catalog to the Advanced Search

Right now we can search by Source, Table, Column, etc, but there is not a UI element for the Catalog. If I write a URL to filter by Catalog, it all works as expected.

Currently, the BigQuery loader puts the Project into the "Catalog" property. The "Schema" property contains the dataset. The Source is always BigQuery (which is helpful, since we have other kinds of databases too)

Service or Ingestion ETL

This seems like a Frontend only change.

Example Screenshots (if appropriate):

If there was Catalog in addition to Source, that would help us out a lot.

Context

We could see how many tables have an "email" column in a certain BigQuery Project. (Certain projects are not allowed to have certain kinds of data)

Jun 08 '20 21:06 lukelowery

Currently we put BQ project ID(is it the catalog you referred to?) in the cluster field(https://github.com/lyft/amundsendatabuilder/blob/master/databuilder/extractor/bigquery_metadata_extractor.py#L77) . If I understand, we want to also use advanced search / filter for a given project ID?

Jun 09 '20 06:06 feng-tao

Yeah, that is what i mean.

I think I can probably update the config here? https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/static/js/config/config-default.ts#L92-L122

Jun 10 '20 17:06 lukelowery

will let @ttannis to comment. The search backend should already had the support for cluster(project in BQ) in https://github.com/lyft/amundsensearchlibrary/blob/master/search_service/proxy/elasticsearch.py#L37.

Jun 11 '20 23:06 feng-tao

@lukelowery, Looking at https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/static/js/config/config-default.ts#L92-L122 the link you post, I think that only works for the case if you have different databases(bq, hive, presto) etc. One short term workaround would be put project id in the database field in your env. But this needs to test it out and verify...

Jun 11 '20 23:06 feng-tao

What's the context of "catalog"? I'm not familiar.

But the general idea is that folks should be able to customize their filters in their custom application configuration. The items we provide in the default configuration aren't expected to be updated for individual use cases, and are kept in sync with the categories supported by the default ES implementation.

Jun 12 '20 18:06 ttannis

@ttannis ,Bigquery reuses the cluster field for catalog which is to represent BQ project per my understanding.

For hive, we have the same field for cluster currently, but for BQ,they could have different field for different tables depending on which catalog they belong to.

Jun 18 '20 00:06 feng-tao

Is it implemented? We are having the same usecase, where we are using snowflake and BigQuery both and we wanted to enable the search by Database Name (In Snowflake) & Project Name (in BQ) which is tied to Cluster field.

Sep 20 '22 19:09 sagar-raythatha

amundsen amundsen copied to clipboard

Add Catalog to the Advanced Search

Service or Ingestion ETL

Example Screenshots (if appropriate):

Context

amundsen
amundsen copied to clipboard