PyHive icon indicating copy to clipboard operation
PyHive copied to clipboard

fetch too many lines and need a better fetchmany

Open darrkz opened this issue 4 years ago • 2 comments

I use pyhive in my python code, but I found that, in pyhive, fetchmany just using fetchone. One case of my code , result has 6000000 lines, and need to export from hive drive, and I test the network latency is 0.3ms, but it still cost about 40 minutes. Is there a better fetchmay function? In my another thrift hive driver , it use real fetchN function and much less network socket call.

darrkz avatar Sep 14 '20 08:09 darrkz

Hey I am looking for a solution to this too. I don't know much about the pyhive backend but I think a solution could be use to fork the pyhive project and rewrite parts of it using thriftpy2 instead of thrift which is more "modern" and most importantly regularly maintained. What is this fetchN function you were mentioning?

lucharo avatar Mar 18 '21 18:03 lucharo

Hey I am looking for a solution to this too. I don't know much about the pyhive backend but I think a solution could be use to fork the pyhive project and rewrite parts of it using thriftpy2 instead of thrift which is more "modern" and most importantly regularly maintained. What is this fetchN function you were mentioning?

I use old python hive thrift driver, and I found that it really fetch N lines from a network transform not like the pyhive driver, pyhive fetchN means fetch N line, but it really fetch n line in n times network transform, so if the result contains huge number lines such 1 million lines, the network transform spend too much time... . It means that fetchN in pyhive is fetch n lines in n times network transform, but not fetch n lines in one transform.

darrkz avatar Jun 27 '21 10:06 darrkz