PyHive
PyHive copied to clipboard
fetch too many lines and need a better fetchmany
I use pyhive in my python code, but I found that, in pyhive, fetchmany just using fetchone. One case of my code , result has 6000000 lines, and need to export from hive drive, and I test the network latency is 0.3ms, but it still cost about 40 minutes. Is there a better fetchmay function? In my another thrift hive driver , it use real fetchN function and much less network socket call.
Hey I am looking for a solution to this too. I don't know much about the pyhive backend but I think a solution could be use to fork the pyhive project and rewrite parts of it using thriftpy2 instead of thrift which is more "modern" and most importantly regularly maintained. What is this fetchN
function you were mentioning?
Hey I am looking for a solution to this too. I don't know much about the pyhive backend but I think a solution could be use to fork the pyhive project and rewrite parts of it using thriftpy2 instead of thrift which is more "modern" and most importantly regularly maintained. What is this
fetchN
function you were mentioning?
I use old python hive thrift driver, and I found that it really fetch N lines from a network transform not like the pyhive driver, pyhive fetchN means fetch N line, but it really fetch n line in n times network transform, so if the result contains huge number lines such 1 million lines, the network transform spend too much time... . It means that fetchN in pyhive is fetch n lines in n times network transform, but not fetch n lines in one transform.