PyHive icon indicating copy to clipboard operation
PyHive copied to clipboard

Complex data types processing

Open ofekby opened this issue 5 years ago β€’ 21 comments

Resolves part of issue #121 (resolve #121)

This pull request aims to enable the conversion of complex types support for presto.

  • Cast structs (known as ROW data type in presto) to a dictionary.
  • Cast the nested varbinary values from base64 encoded strings to bytes.

Note: This pull request could have been a breaking change. To avoid it the process_complex_columns parameter was added to the cursor with default value of False.

ofekby avatar Apr 20 '20 11:04 ofekby

Codecov Report

Merging #328 (9067bdd) into master (1548ecc) will increase coverage by 0.08%. The diff coverage is 100.00%.

:exclamation: Current head 9067bdd differs from pull request most recent head c0d1a30. Consider uploading reports for the commit c0d1a30 to get more accurate results

@@            Coverage Diff             @@
##           master     #328      +/-   ##
==========================================
+ Coverage   93.36%   93.44%   +0.08%     
==========================================
  Files          14       14              
  Lines        1553     1572      +19     
  Branches      167      168       +1     
==========================================
+ Hits         1450     1469      +19     
  Misses         75       75              
  Partials       28       28              
Impacted Files Coverage Ξ”
pyhive/presto.py 87.05% <100.00%> (+0.55%) :arrow_up:
pyhive/tests/test_presto.py 100.00% <100.00%> (ΓΈ)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Apr 20 '20 13:04 codecov[bot]

@ofekby did a quick pass on the diff, it looks really promising, thank you for contributing!

2 high level thoughts here, this class hierarchy is not very pythonic feels more java like. I seems like it could be rewritten via just using the functions and be in a smaller number of files.

Also could you please add integration tests as well e.g. run actual sql queries ?

bkyryliuk avatar Apr 28 '20 17:04 bkyryliuk

It might be possible to minimize the number of files, although I don't think changing it would give real profit since the code readable, already covered with unittests and it works...

Regarding the integration tests - I will add a few in the next days

ofekby avatar May 14 '20 07:05 ofekby

  • Added an integration test with complex new table.
  • Fixed some issues with the map processing (casting map keys to corresponding python type)

If you @bkyryliuk could please review or assign someone to review, thanks.

ofekby avatar May 16 '20 14:05 ofekby

Wow this PR is a game changer for the whole Presto community! Thanks!

kakipipi23 avatar May 17 '20 06:05 kakipipi23

  • Removed the class inheritance
  • Changed the processors to be a functions

@bkyryliuk please provide more detailed comments if there is any issue with the code

ofekby avatar May 20 '20 06:05 ofekby

Much needed and appreciated! Thanks!

boazberman avatar May 20 '20 11:05 boazberman

Wow thats super important feature can't wait until it get merged

orstav avatar May 20 '20 11:05 orstav

Thank you so much for adding this!

yana33083 avatar May 20 '20 11:05 yana33083

Waiting for this for 6 months!!!

ilanb1996 avatar May 20 '20 11:05 ilanb1996

Looks great, thank you!

GalDayan avatar May 20 '20 11:05 GalDayan

Any updates?

boazberman avatar May 26 '20 17:05 boazberman

@bkyryliuk Is there anything the community can do to help push this over the finish line? We need this and I really want to avoid maintaining an internal fork with this patch applied.

hashhar avatar Jun 23 '20 01:06 hashhar

This is super important, isn't there anything that can be done to prompt the merge request? @bkyryliuk

boazberman avatar Jul 04 '20 13:07 boazberman

@bkyryliuk Any updates?

GalDayan avatar Nov 09 '20 19:11 GalDayan

+1. Thanks for the contribution

panoptikum avatar May 26 '21 15:05 panoptikum

@ofekby please make sure that https://github.com/ofekby/presto-types-parser is py2 friendly @hashhar if you can test it in your dev / prod env - it would be great, I don't have capacity to do it right now

bkyryliuk avatar May 27 '21 00:05 bkyryliuk

I am on parental leave now, @ashishgandhi / @m-kogan could someone take a look ?

bkyryliuk avatar Jul 29 '21 17:07 bkyryliuk

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Apr 16 '22 21:04 CLAassistant

I am on parental leave now, @ashishgandhi / @m-kogan could someone take a look ?

@ashishgandhi / @m-kogan if you could please review the changes, and enable back the CI process for this PR.

ofekby avatar Nov 30 '22 10:11 ofekby

What's the status here? We are waiting for this at the Wikimedia Foundation! :)

ottomata avatar Oct 23 '23 15:10 ottomata