EJ Song comments

Results 92 comments of


                                            EJ Song

[FEATURE REQUEST]: Create helper function to check whether index is actually used in the plan

`hs.explain(df)` returns used indexes, but I think we need to improve the output result so that it shows the used log version & the number of times each index applied.

Revisit FileBasedSignatureProvider for a possible performance optimization

I think we cannot guarantee the order of HashSet iterator. It can differ depending on the internal implementation. (e.g. different hash bucket size, hash functions..)

Revisit FileBasedSignatureProvider for a possible performance optimization

> Maybe I misunderstood the issue, but why is the order important? Why can't we simply XOR or even just add signatures? I'm asking this because I don't get why...

Revisit FileBasedSignatureProvider for a possible performance optimization

@clee704 Yea I'm good with XOR approach & using byte[] (be aware of its format in JSON) cc @imback82

Revisit FileBasedSignatureProvider for a possible performance optimization

Btw changing signature will break backward compatibility; so would be good to go with other breaking changes.

Revisit FileBasedSignatureProvider for a possible performance optimization

>Does it mean that users should manually create indexes again? Yes > Because refreshAction won't work because it can't load the previous version's IndexLogEntry, or weird things can happen if...

Is Project HyperSpace Deprecated?

Hi, sorry for the late reply. Yes Hyperspace is currently pending due to priority.

deleteOldVersions API

Yea `hs.index("indexName")` was added after the sentences.

No but there's an internal API - `getIndexLogEntry` and you can use `getIndexContentDirectoryPaths`. BTW I realized that the following description is invalid now. > hs.deleteOldVersions("indexName", Seq(0, 1, 3)) // remove...

Check and remove unnecessary shuffle added by Hybrid Scan

Measured `executePlan` performance using TPCH dataset: ```scala val filter1 = linetable.filter(linetable("l_orderkey") isin (1234,12341234, 123456)).select("l_orderkey") val filter2 = linetable.filter(linetable("l_orderkey") isin (1234,12341234)).select("l_orderkey") val join = filter1.join(filter2, "l_orderkey") val plan = join.queryExecution.optimizedPlan measure(spark.sessionState.executePlan(plan).executedPlan)...