amundsen icon indicating copy to clipboard operation
amundsen copied to clipboard

Add Catalog to the Advanced Search

Open lukelowery opened this issue 4 years ago • 7 comments

Right now we can search by Source, Table, Column, etc, but there is not a UI element for the Catalog. If I write a URL to filter by Catalog, it all works as expected.

Currently, the BigQuery loader puts the Project into the "Catalog" property. The "Schema" property contains the dataset. The Source is always BigQuery (which is helpful, since we have other kinds of databases too)

Service or Ingestion ETL

This seems like a Frontend only change.

Example Screenshots (if appropriate):

image If there was Catalog in addition to Source, that would help us out a lot.

Context

We could see how many tables have an "email" column in a certain BigQuery Project. (Certain projects are not allowed to have certain kinds of data)

lukelowery avatar Jun 08 '20 21:06 lukelowery

Currently we put BQ project ID(is it the catalog you referred to?) in the cluster field(https://github.com/lyft/amundsendatabuilder/blob/master/databuilder/extractor/bigquery_metadata_extractor.py#L77) . If I understand, we want to also use advanced search / filter for a given project ID?

feng-tao avatar Jun 09 '20 06:06 feng-tao

Yeah, that is what i mean.

I think I can probably update the config here? https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/static/js/config/config-default.ts#L92-L122

lukelowery avatar Jun 10 '20 17:06 lukelowery

will let @ttannis to comment. The search backend should already had the support for cluster(project in BQ) in https://github.com/lyft/amundsensearchlibrary/blob/master/search_service/proxy/elasticsearch.py#L37.

feng-tao avatar Jun 11 '20 23:06 feng-tao

@lukelowery, Looking at https://github.com/lyft/amundsenfrontendlibrary/blob/master/amundsen_application/static/js/config/config-default.ts#L92-L122 the link you post, I think that only works for the case if you have different databases(bq, hive, presto) etc. One short term workaround would be put project id in the database field in your env. But this needs to test it out and verify...

feng-tao avatar Jun 11 '20 23:06 feng-tao

What's the context of "catalog"? I'm not familiar.

But the general idea is that folks should be able to customize their filters in their custom application configuration. The items we provide in the default configuration aren't expected to be updated for individual use cases, and are kept in sync with the categories supported by the default ES implementation.

ttannis avatar Jun 12 '20 18:06 ttannis

@ttannis ,Bigquery reuses the cluster field for catalog which is to represent BQ project per my understanding.

For hive, we have the same field for cluster currently, but for BQ,they could have different field for different tables depending on which catalog they belong to.

feng-tao avatar Jun 18 '20 00:06 feng-tao

Is it implemented? We are having the same usecase, where we are using snowflake and BigQuery both and we wanted to enable the search by Database Name (In Snowflake) & Project Name (in BQ) which is tied to Cluster field.

sagar-raythatha avatar Sep 20 '22 19:09 sagar-raythatha