shc
shc copied to clipboard
Inconsistent number of rows using PySpark between function
Hi everyone, I'm using the HBase connector with PySpark and I am noticing a difference between results when using the connector compared to HBase shell. Using HBase shell with a startrow and endrow keys, recreating with the connector via PySpark using the "between" DataFrame function the number of rows can differ for certain key combinations. In this instance the key is a bytearray composite key. Has anyone experience the issue or can offer any alternatives? Thanks!
Do you fix it?