shc icon indicating copy to clipboard operation
shc copied to clipboard

Inconsistent number of rows using PySpark between function

Open PaulThompsonUk opened this issue 6 years ago • 1 comments

Hi everyone, I'm using the HBase connector with PySpark and I am noticing a difference between results when using the connector compared to HBase shell. Using HBase shell with a startrow and endrow keys, recreating with the connector via PySpark using the "between" DataFrame function the number of rows can differ for certain key combinations. In this instance the key is a bytearray composite key. Has anyone experience the issue or can offer any alternatives? Thanks!

PaulThompsonUk avatar Apr 05 '18 10:04 PaulThompsonUk

Do you fix it?

Lihengwannafly avatar May 02 '19 13:05 Lihengwannafly