PyHive
PyHive copied to clipboard
Add sparksql dialect
This PR was based on https://github.com/dropbox/PyHive/pull/187 and only add some fixes due PEP8.
No unit tests for this new dialect were added because many tests done by sqlalchemy_test_case will fail due the lack of support of some types by spark (SPARK-21529).
Codecov Report
Merging #247 into master will decrease coverage by
2.81%. The diff coverage is0%.
@@ Coverage Diff @@
## master #247 +/- ##
==========================================
- Coverage 93.94% 91.12% -2.82%
==========================================
Files 14 15 +1
Lines 1487 1533 +46
Branches 159 169 +10
==========================================
Hits 1397 1397
- Misses 64 108 +44
- Partials 26 28 +2
| Impacted Files | Coverage Δ | |
|---|---|---|
| pyhive/sqlalchemy_sparksql.py | 0% <0%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update d19cb0c...a07d74d. Read the comment docs.
What is holding this PR up?
@jingw - Is there something that we can do to move this PR along and make it part of the project, or does Spark SQL not fit the mission?
Many projects relying on PyHive experience problem #150 . Is there any way we can make this PR merged?
+1. Any plans to get this merged?
Haven't seen this PR, looks nice. If someone could add unit tests, will be happy to merge and do another pyhive release.
@bkyryliuk it seems there has been efforts to get Spark SQL in for a long time, but many previous PRs have gone stale in the end. As potential problems are limited to Spark SQL only, in the interest of getting this functionality out there, I wonder if it would make sense to let this in without rigorous tests, and add tests later if/when problems surface?
It would be quite challenging to maintain from our prospective as we don't leverage spark much. I am not looking for 100 % test coverage, but would prefer to have at least a smoke test.
Presto & hive setup doesn't seem to be very involved process: https://github.com/dropbox/PyHive/blob/master/scripts/travis-install.sh I assume spark would be somewhat similar
@bkyryliuk I've tried to add some unit tests, but many done by sqlalchemy_test_case will fail due the lack of support by spark. It's possible to do some tests, but all tests done by sqlalchemy_test_case will be omitted.
you can use sqlalchemy engine in those test to do a pass for the not supported functions. Superset has a good example: https://github.com/apache/incubator-superset/blob/903217f64d38b2083bb62a8a2b81686a607ba479/tests/sqllab_tests.py#L76
So many people will be so happy if this is merged and released soon :)
@bkyryliuk I can help write some tests if necessary, I think this would be a really nice feature to have.
What's the status on this? Would be happy to help.
I applied your changes to my project, but had a little issue with column containing # Partitioning", "Not partitioned" and one being empty. I saw there was a filter on a similar column in sqlalchemy_hive.py. I guess this differs from one hive version to another (using 2.3.7 here).
What I did to fix this was to change this line from sqlalchemy_sparksql.py to rows = [column for column in connection.execute('DESCRIBE {}'.format(full_table)).fetchall() if column[0] not in {"# Partitioning", "Not partitioned", ""}].
Any updates. Maybe we can help to get this through!
I have been watching this one but haven't seen any action...
Is there any progress with this? This is causing issues in related applications like Superset
Any updates regarding this PR?
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.
:white_check_mark: gmcoringa
:x: Hao Qin Tan
Hao Qin Tan seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.
Hello mates, What could we do to move ahead with this one ? ready to help!
Same here, this is particularly important when it comes to catalog metadata fetch in tools like Superset, currently, we cant use physical references to the table, only virtual SQL queries, and metadata exploration using the UI is blocked.