festival icon indicating copy to clipboard operation
festival copied to clipboard

`Collection` patching

Open hinto-janai opened this issue 1 year ago • 0 comments

What

Due to index invalidation, the Collection cannot be mutated.

Assuming that most Collection resets only contain a small amount of data change (adding an album or two), it makes more sense and would be much cheaper to re-use the inner structures, append/remove data, and re-calculate the indices rather than creating a new Collection from scratch.

This will be referred to as "patching".

Steps

The steps would be something like:

  1. Store PATHs associated with the current Collection
  2. Actively watch and record any file changes to those PATHs
  3. When a new Collection is created, diff the PATHs requested with our modified PATH history:
old paths   new paths
---------   ---------
    a
    b           b
    c           c
    d          *d
    e           e

In the above example:

  • a was removed
  • b, c, and e are the same
  • d is the same PATH, but something about the underlying file changed

This means we only actually have to scan d, and should forget about a in our new Collection.

  1. Disassemble the current Collection into the individual structures, leaving behind the data that is no longer needed, and keeping the data the user still wants:
|---------------------------|      |-----|  |------------|
| Old Collection            |      | Old |  | Collection |
|---------------------------|      |-----|-|-------------|    |-----|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |      | 1 | 2 |                  |  41 |
|           [...]           | ---> |-----|-|----|     |----|  |     |
|                           |      |     | | 12 |     | 33 |  | 22  |
|    |----------------------|      |-----|------|     |----|  |-----|
|    | some irrelevant data |      | 42  |
|---------------------------|      |-----|
  1. After this, the new PATHs would be scanned, and we would go through the process of "The Loop", appending the data to the structures we already have if possible:
|----------------|-----------------|
| Old Collection | new data        |
|-----|----------|-----|           |--|
| 19  | | 1 |  2 |  42 |--|        22 |
|-----|-----|    |-----|  |       |---|
| 66  |     |  12 |    |  | 33 41 |
|-----|-----|--|--|    |--|-------|
| 999 | new data       | new data |
|-----|--------|----------|-------|   

The data would be in a cohesive structure, but the indices would be all over the place and would be pointing at the wrong things.

Problems

The main problems that needs to be solved:

  • How will the indices be reconnected?
  • How will irrelevant data be pruned? (.drain() into new Vec?)

Viable Situations

This will be faster than from-scratch creation as long as the delta between the new PATHs and old PATHs are <50%.

If there are many fewer or many more new PATHs, it would make more sense to just ditch the old and start from scratch.

Since most Collection resets are likely to only contain small changes, this patching approach would probably be viable most of the time, and would lead to major performance improvements since we would be only need to scan the new PATHs, and shift some numbers around (indices).

hinto-janai avatar May 09 '23 03:05 hinto-janai