shc Inconsistent number of rows using PySpark between function

Inconsistent number of rows using PySpark between function

Open PaulThompsonUk opened this issue 6 years ago • 1 comments

Hi everyone, I'm using the HBase connector with PySpark and I am noticing a difference between results when using the connector compared to HBase shell. Using HBase shell with a startrow and endrow keys, recreating with the connector via PySpark using the "between" DataFrame function the number of rows can differ for certain key combinations. In this instance the key is a bytearray composite key. Has anyone experience the issue or can offer any alternatives? Thanks!

Apr 05 '18 10:04 PaulThompsonUk

Do you fix it?

May 02 '19 13:05 Lihengwannafly

shc shc copied to clipboard

Inconsistent number of rows using PySpark between function

shc
shc copied to clipboard