BigQuery_Helper
BigQuery_Helper copied to clipboard
Simplify specification of FROM table in query_to_pandas_safe
As I understand it, query_to_pandas_safe
requires FROM fields to be specified in a somewhat cumbersome way as shown in the following query
SELECT license, COUNT(1) num_repos
FROM `bigquery-public-data.github_repos.licenses`
GROUP BY license
The bq_helper object used to run this query already knows the query is being called on bigquery-public-data.github_repos
. This could be programmatically added, so the user can run the query as
SELECT license, COUNT(1) num_repos
FROM licenses
GROUP BY license
This query looks much nicer. I see two approaches to implement this change and maintain backwards compatibility:
- Use a regex or python string functions to determine whether the helper needs to add
\
self.active_project + '.' + self.dataset_name + ...` to the table name - Add an optional argument
simplified_table_name
which determines whether to do the string manipulation described above.
The value of this change may depend on the design of the upcoming BQ integration. Will Kaggle users continue using bq_helper? Will the integration do this in any way? etc.
Maybe @harrisse @mrisdal will have insight on it. If this is going to be an ongoing issue, I can send a PR for one of the two proposals above.
I agree that the latter query looks nicer, but this change would render queries incompatible with the rest of the BigQuery ecosystem. Isn't rendering queries unusable in the BQ console a fatal flaw?
hello @SohierDane I am sorry to bother you here since my question is about commercial use of this dataset in Kaggle. (https://www.kaggle.com/tmdb/tmdb-movie-metadata/discussion/88376) We are an online-education company from China and we are producing an online-lesson of data analyzing. Is it possible for us to use this dataset in one of teaching examples? or there are other conditions to use this dataset? Thx!
I'm afraid I don't have any guidance to provide beyond what's already posted on that dataset. If the existing guidance is unclear, I'd recommend going to our main datasets listing and filtering down to only the creative commons datasets.
Best,
Sohier
On Mon, Apr 8, 2019, 2:19 AM hahahahahong [email protected] wrote:
hello @SohierDane https://github.com/SohierDane I am sorry to bother you here since my question is about commercial use of this dataset in Kaggle. (https://www.kaggle.com/tmdb/tmdb-movie-metadata/discussion/88376) We are an online-education company from China and we are producing an online-lesson of data analyzing. Is it possible for us to use this dataset in one of teaching examples? or there are other conditions to use this dataset? Thx!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SohierDane/BigQuery_Helper/issues/14#issuecomment-480753131, or mute the thread https://github.com/notifications/unsubscribe-auth/AEJVTlSmhHNAh10EXzug9TTmGOULdhJOks5vewmWgaJpZM4ZnLsI .
Thank you so much for your advice ~