recsys-examples icon indicating copy to clipboard operation
recsys-examples copied to clipboard

[BUG] incremental dump test failed under LFU mode

Open jiashuy opened this issue 3 months ago • 2 comments

Describe the bug Tests of incremantal dump will accidentally failed on the LFU specificaction

Steps/Code to reproduce bug

Expected behavior LFU Test passed in incremental dump tests

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of recsys-examples install: [conda, Docker, pip, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used
  • Run print_env.sh from the project root and paste the results here

Additional context Add any other context about the problem here.


By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.

jiashuy avatar Oct 10 '25 05:10 jiashuy

If HKV needs to evict m keys, but n keys meet the criteria, there are at most C^m_n possibilities (this is a combinatorial problem...). The p keys with the smallest scores must be evicted (0 <= p <= m).

During concurrent insertion, a key that should have been found in the current batch might be evicted by another thread, causing it to be re-inserted, and the score count reset. However, sequential insertion in the Simulator might cause the score to accumulate, and vice versa (scores originally evictted in the Simulator might accumulate in HKV).

After each insertion, based on the results of insertions in HKV and the Simulator, we can determine whether the evictment is reasonable and calibrate the Simulator's keys to match those in HKV, using this as the basis for the next insertion's judgment.

jiashuy avatar Nov 26 '25 13:11 jiashuy

If HKV needs to evict m keys, but n keys meet the criteria, there are at most C^m_n possibilities (this is a combinatorial problem...). The p keys with the smallest scores must be evicted (0 <= p <= m).

During concurrent insertion, a key that should have been found in the current batch might be evicted by another thread, causing it to be re-inserted, and the score count reset. However, sequential insertion in the Simulator might cause the score to accumulate, and vice versa (scores originally evictted in the Simulator might accumulate in HKV).

After each insertion, based on the results of insertions in HKV and the Simulator, we can determine whether the evictment is reasonable and calibrate the Simulator's keys to match those in HKV, using this as the basis for the next insertion's judgment.

Translate as it not easy to understand: 假如说HKV需要淘汰m个key,但是有n个是符合条件的,最多有C^m_n种可能(这是一个组合问题...)。score最小的p个key必须被淘汰(0<=p<=m)。 并发的插入过程中,当前batch里本应该查找到的key有可能被其他线程淘汰,导致重新插入,而score重新计数,但是Simulator中的串行插入可能导致score被累积,反之依然(原本在Simulator中被淘汰的score,有可能在HKV中累积)。 可以在每次插入后,根据HKV和Simulator插入的结果,判断这种淘汰是不是合理,并且将Simulator的key校准到和HKV一样,作为下一次插入的判断基础。

jiashuy avatar Nov 26 '25 13:11 jiashuy

PR #242

JacoCheung avatar Dec 03 '25 01:12 JacoCheung

fixed in #242

shijieliu avatar Dec 10 '25 01:12 shijieliu