spark-hbase-connector
spark-hbase-connector copied to clipboard
Change HBaseData and column into FieldMapper
In order to avoid storing default values into Hbase we have to change how the fields are mapped.
My proposal is to change the type HBaseData from
type HBaseData = Iterable[Option[Array[Byte]]]
to
type ColumnQualifier = String
type HBaseData = Map[ColumnQualifier,Option[Array[Byte]]]
using this approach we can transform the method columns from Iterable[String] to Set[String] and we don't need to use and order base conversion in case classes mapping.
I'm aware that custom mapping is not well implemented in this library and need to be changed.
Right now, there are two methods for custom mapping, the first one requiring the usage of the HBase API for converting fields to Array[Byte], the second one, simpler, requiring no such knowledge. Your change seems to affect only the "low-level" method for custom mapping.
I think that we should encourage the usage of the simple method. There are many scala types that aren't currently managed by the library, including collection objects. When these types will be added to the library, the users should not be required to learn how to convert them to Array[Byte], it should be responsibility of the library.
Currently the library uses a compile-time type binding so, when you try to convert a tuple (Int, Double) to a couple of Array[Byte], the compiler binds each element of the tuple to the right converter. If we use the same strategy you are suggesting for the "simple mapping", i.e. using a Map instead of a tuple and a list of columns, we will lose type information at compile time and the conversion mechanism would not work. I think we should find a better strategy supporting only "simple mapping".