BigQuery_Helper icon indicating copy to clipboard operation
BigQuery_Helper copied to clipboard

Simplify specification of FROM table in query_to_pandas_safe

Open dansbecker opened this issue 6 years ago • 4 comments

As I understand it, query_to_pandas_safe requires FROM fields to be specified in a somewhat cumbersome way as shown in the following query

SELECT license, COUNT(1) num_repos 
FROM `bigquery-public-data.github_repos.licenses` 
GROUP BY license 

The bq_helper object used to run this query already knows the query is being called on bigquery-public-data.github_repos. This could be programmatically added, so the user can run the query as

SELECT license, COUNT(1) num_repos 
FROM licenses 
GROUP BY license 

This query looks much nicer. I see two approaches to implement this change and maintain backwards compatibility:

  1. Use a regex or python string functions to determine whether the helper needs to add \self.active_project + '.' + self.dataset_name + ...` to the table name
  2. Add an optional argument simplified_table_name which determines whether to do the string manipulation described above.

The value of this change may depend on the design of the upcoming BQ integration. Will Kaggle users continue using bq_helper? Will the integration do this in any way? etc.

Maybe @harrisse @mrisdal will have insight on it. If this is going to be an ongoing issue, I can send a PR for one of the two proposals above.

dansbecker avatar Jan 02 '19 20:01 dansbecker

I agree that the latter query looks nicer, but this change would render queries incompatible with the rest of the BigQuery ecosystem. Isn't rendering queries unusable in the BQ console a fatal flaw?

SohierDane avatar Jan 09 '19 23:01 SohierDane

hello @SohierDane I am sorry to bother you here since my question is about commercial use of this dataset in Kaggle. (https://www.kaggle.com/tmdb/tmdb-movie-metadata/discussion/88376) We are an online-education company from China and we are producing an online-lesson of data analyzing. Is it possible for us to use this dataset in one of teaching examples? or there are other conditions to use this dataset? Thx!

hahahahahong avatar Apr 08 '19 09:04 hahahahahong

I'm afraid I don't have any guidance to provide beyond what's already posted on that dataset. If the existing guidance is unclear, I'd recommend going to our main datasets listing and filtering down to only the creative commons datasets.

Best,

Sohier

On Mon, Apr 8, 2019, 2:19 AM hahahahahong [email protected] wrote:

hello @SohierDane https://github.com/SohierDane I am sorry to bother you here since my question is about commercial use of this dataset in Kaggle. (https://www.kaggle.com/tmdb/tmdb-movie-metadata/discussion/88376) We are an online-education company from China and we are producing an online-lesson of data analyzing. Is it possible for us to use this dataset in one of teaching examples? or there are other conditions to use this dataset? Thx!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SohierDane/BigQuery_Helper/issues/14#issuecomment-480753131, or mute the thread https://github.com/notifications/unsubscribe-auth/AEJVTlSmhHNAh10EXzug9TTmGOULdhJOks5vewmWgaJpZM4ZnLsI .

SohierDane avatar Apr 09 '19 16:04 SohierDane

Thank you so much for your advice ~

hahahahahong avatar Apr 29 '19 01:04 hahahahahong