DITA icon indicating copy to clipboard operation
DITA copied to clipboard

Range Query has answers lesser than expected

Open samadDotDev opened this issue 5 years ago • 0 comments

Doing an experiment on standalone branch revealed that circle range search query returns lesser answers than actually present in the data. Following short ExampleApp code can reproduce the problem:

  def main(args: Array[String]): Unit = {

    // Turn off excessive logging from spark
    Logger.getLogger("org").setLevel(Level.OFF)
    Logger.getLogger("akka").setLevel(Level.OFF)

    val spark = SparkSession
      .builder()
      .master("local[*]")
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .getOrCreate()

    val trajs = spark.sparkContext
      .textFile("src/main/resources/trajectory.txt")
      .zipWithIndex().map(getTrajectory)
      .filter(_.points.length >= DITAConfigConstants.TRAJECTORY_MIN_LENGTH)
      .filter(_.points.length <= DITAConfigConstants.TRAJECTORY_MAX_LENGTH)
    println(s"Trajectory count: ${trajs.count()}")

    val rdd1 = new TrieRDD(trajs)
    val search = TrajectoryRangeAlgorithms.DistributedSearch

    // circle range search
    val center = Point(Array(39.9, 116.3))
    val radius = 0.1
    
    // Perform DITA's (Indexed) range search
    val ditaRangeSearch = search.search(spark.sparkContext, center, rdd1, radius)

    // Perform an exhaustive range search
    val exhaustiveRangeSearch = trajs.filter(t => t.points.forall(p => p.minDist(center) <= radius))

    println(s"Circle range search count: DITA: ${ditaRangeSearch.count()}, Exhaustive: ${exhaustiveRangeSearch.count()}")
  }

It has the following output on provided dataset (trajectory.txt):

Trajectory count: 5595
Circle range search count: DITA: 266, Exhaustive: 860

i.e. Ideally, the range search count should return 860 results but it only returns 266. This difference becomes even more critical in some cases when range is small and DITA's range query doesn't return a result at all while there are many trajectories present in the data satisfying the query.

samadDotDev avatar Sep 04 '20 06:09 samadDotDev