spark-hbase-connector icon indicating copy to clipboard operation
spark-hbase-connector copied to clipboard

convert to dataframe

Open fadaytak opened this issue 8 years ago • 5 comments

hi,

any idea to convert the RDD to dataframe to make join with other dataframe

val rdd = sc.hbaseTable[(String, String)]("table")
  .select("col")
  .inColumnFamily(columnFamily)
  .withStartRow("00000")
  .withStopRow("00500")

cordially

fadaytak avatar Jun 01 '16 14:06 fadaytak

Refer Below :

object BaseApp extends App {
  val sparkConf = new SparkConf().setAppName("BaseApp").setMaster("local[4]")
  sparkConf.set("spark.hbase.host", <Your-ZK-HOST>)
  val sc = new SparkContext(sparkConf)
  val sqlContext = new SQLContext(sc)

  //Person Table with CF : DET and columns Name, City
  val schemaString= "Name,City"
  val rdd= sc.hbaseTable[(Option[String], Option[String])]("Person")
    .select("Name", "City")
    .inColumnFamily("DET")

  val rowRdd = rdd.map(p => Row(p._1.get, p._2.get ))

  val schema= StructType(schemaString.split(",").map(fieldName => StructField(fieldName, StringType, true)))
  val df= sqlContext.createDataFrame(rowRdd , schema);
  df.registerTempTable("Person")

  sqlContext.sql(<Your-SQL>).show()
}

mkanchwala avatar Jun 24 '16 11:06 mkanchwala

@fadaytak does this snippet resolves your query? Kindly close this ticket

mkanchwala avatar Jun 28 '16 06:06 mkanchwala

Improved version of code of @mkanchwala

object BaseApp extends App {
  val sparkConf = new SparkConf().setAppName("BaseApp").setMaster("local[4]")
  sparkConf.set("spark.hbase.host", <Your-ZK-HOST>)
  val sc = new SparkContext(sparkConf)
  val sqlContext = new SQLContext(sc)

  //Person Table with CF : DET and columns Name, City
  val schemaString= "Name,City"
  val rdd= sc.hbaseTable[(Option[String], Option[String])]("Person")
    .select("Name", "City")
    .inColumnFamily("DET")

  val rowRdd = rdd.map(p => Row(p._1.get, p._2.get ))
  object schema {
  val name = StructField("Name", StringType)
  val city = StructField("City", StringType)
  val struct = StructType(Array(name, city))
    }
  
  val df= sqlContext.createDataFrame(rowRdd , schema.struct);
  df.registerTempTable("Person")

  sqlContext.sql(<Your-SQL>).show()
}

chetkhatri avatar Jan 18 '17 07:01 chetkhatri

@fadaytak Kindly check & confirm by closing this issue.

chetkhatri avatar Jan 18 '17 07:01 chetkhatri

@chetkhatri

have you tried to use Collections Types.

Whenever I use Map[Int,String] inside the tuples to be persisted to HBase, it never worked with me.

Can u help with that ?

Elbehery avatar Jun 02 '17 14:06 Elbehery