magellan icon indicating copy to clipboard operation
magellan copied to clipboard

Point not found when injectRules used

Open bdgeise opened this issue 6 years ago • 4 comments

val spark = SparkSession.builder
    .appName("Testing Spark DSL")
    .master("local[1]") //build a local cluster
    .getOrCreate()

//  injectRules(spark)

  import spark.implicits._

  val data = Array(("US", "TX", "2018-12-08 00:00:00", 12.0123, "ios", 2, 32.813548, -96.835159),
    ("US", "PA", "2018-12-08 00:00:00", 12.0123, "ios", 183,32.813548, -96.835159),
    ("CA", null, "2018-12-08 00:00:00", 12.0123, "android", 183,32.813548, -96.835159),
    ("GB", null, "2018-12-08 00:00:00", 12.0123, "ios", 2,32.813548, -96.835159),
    ("US", "NC", "2018-12-08 00:00:00", 12.0123, "android", 35,32.813548, -96.835159),
    ("US", "CA", "2018-12-08 00:00:00", 12.0123, null, 2,32.813548, -96.835159),
    ("A", null, "2018-12-08 00:00:00", 12.0123, "android", 183,32.813548, -96.835159),
    ("US", "NY", "2018-12-08 00:00:00", 12.0123, "ios", 2, 32.813548, -96.835159))

  val df1 = spark.sparkContext.parallelize(data).toDF("country", "state", "location_at",
    "horizontal_accuracy", "platform", "app_id", "latitude", "longitude")
    .withColumn("location_at", col("location_at").cast(TimestampType))
  df1.show()
  println(df1.printSchema)

  val filterFilePath = path_to_geojson

  val filteringDS = spark.sqlContext.read.format("magellan")
    .option("magellan.index", "true")
    .option("magellan.index.precision", "15")
    .option("type", "geojson").load(filterFilePath)
    .cache()

  filteringDS.count()
  filteringDS.show(false)

  val filtered = df1
    .withColumn("locationPoint", point(col("longitude"), col("latitude")))
    .join(filteringDS)
    .where(col("locationPoint") within col("polygon"))

  filtered.show()

Using the example above, if I just injectRules I get 0 results. But if I don't use injectRules I get the proper results.

Also, to note, I've tried different levels of precision in the index but the same issue persisted when injecting the rules.

Geojson file used for testing attached. TX.geojson.txt

bdgeise avatar Mar 04 '19 15:03 bdgeise

@harsha2010 - Any luck looking at this one?

bdgeise avatar Mar 18 '19 20:03 bdgeise

I was able to do some more testing/debugging today. If I do a true cross join, and test for point within polygon using a withColumn, it returns true. However when I do it in the where I still get an empty dataframe in return while using inject rules. @harsha2010

bdgeise avatar Mar 19 '19 12:03 bdgeise

Another update here...Seems to work ok with the master branch and Spark 2.3.2. Are you aware of any changes since the 1.05 release that I might be able to look at and test against? @harsha2010

bdgeise avatar Mar 19 '19 20:03 bdgeise

@bdgeise there is this bug I noticed and fixed a while back.. https://github.com/harsha2010/magellan/commit/aa9021eec14ccbdab4c90316ff9a7bf129873f8e

not sure if that is related. let me try this on 1.0.5 branch and check today

harsha2010 avatar Mar 19 '19 20:03 harsha2010