EJ Song
EJ Song
`hs.explain(df)` returns used indexes, but I think we need to improve the output result so that it shows the used log version & the number of times each index applied.
I think we cannot guarantee the order of HashSet iterator. It can differ depending on the internal implementation. (e.g. different hash bucket size, hash functions..)
> Maybe I misunderstood the issue, but why is the order important? Why can't we simply XOR or even just add signatures? I'm asking this because I don't get why...
@clee704 Yea I'm good with XOR approach & using byte[] (be aware of its format in JSON) cc @imback82
Btw changing signature will break backward compatibility; so would be good to go with other breaking changes.
>Does it mean that users should manually create indexes again? Yes > Because refreshAction won't work because it can't load the previous version's IndexLogEntry, or weird things can happen if...
Hi, sorry for the late reply. Yes Hyperspace is currently pending due to priority.
Yea `hs.index("indexName")` was added after the sentences.
No but there's an internal API - `getIndexLogEntry` and you can use `getIndexContentDirectoryPaths`. BTW I realized that the following description is invalid now. > hs.deleteOldVersions("indexName", Seq(0, 1, 3)) // remove...
Measured `executePlan` performance using TPCH dataset: ```scala val filter1 = linetable.filter(linetable("l_orderkey") isin (1234,12341234, 123456)).select("l_orderkey") val filter2 = linetable.filter(linetable("l_orderkey") isin (1234,12341234)).select("l_orderkey") val join = filter1.join(filter2, "l_orderkey") val plan = join.queryExecution.optimizedPlan measure(spark.sessionState.executePlan(plan).executedPlan)...