DITA
DITA copied to clipboard
Range Query has answers lesser than expected
Doing an experiment on standalone branch revealed that circle range search query returns lesser answers than actually present in the data. Following short ExampleApp code can reproduce the problem:
def main(args: Array[String]): Unit = {
// Turn off excessive logging from spark
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
val spark = SparkSession
.builder()
.master("local[*]")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.getOrCreate()
val trajs = spark.sparkContext
.textFile("src/main/resources/trajectory.txt")
.zipWithIndex().map(getTrajectory)
.filter(_.points.length >= DITAConfigConstants.TRAJECTORY_MIN_LENGTH)
.filter(_.points.length <= DITAConfigConstants.TRAJECTORY_MAX_LENGTH)
println(s"Trajectory count: ${trajs.count()}")
val rdd1 = new TrieRDD(trajs)
val search = TrajectoryRangeAlgorithms.DistributedSearch
// circle range search
val center = Point(Array(39.9, 116.3))
val radius = 0.1
// Perform DITA's (Indexed) range search
val ditaRangeSearch = search.search(spark.sparkContext, center, rdd1, radius)
// Perform an exhaustive range search
val exhaustiveRangeSearch = trajs.filter(t => t.points.forall(p => p.minDist(center) <= radius))
println(s"Circle range search count: DITA: ${ditaRangeSearch.count()}, Exhaustive: ${exhaustiveRangeSearch.count()}")
}
It has the following output on provided dataset (trajectory.txt):
Trajectory count: 5595
Circle range search count: DITA: 266, Exhaustive: 860
i.e. Ideally, the range search count should return 860 results but it only returns 266. This difference becomes even more critical in some cases when range is small and DITA's range query doesn't return a result at all while there are many trajectories present in the data satisfying the query.