CrowdAnki icon indicating copy to clipboard operation
CrowdAnki copied to clipboard

Prevent update of unchanged notes

Open lstrobel opened this issue 8 months ago • 2 comments

Problem

When importing large decks from JSON, I noticed that Anki will sync every note in the deck despite there being no changes. For very large decks (mine has tens of thousands of notes), this can cause a lot of churn.

This appears to be caused by self.anki_object.mod = anki.utils.int_time() in the note saving process, which unconditionally updates the modification timestamp of all notes.

Ask

If a note's fields haven't changed from what's already in the collection, don't update the modification timestamp or flush changes to the database.

I'd be happy to push a PR for this feature, if it really is as simple as checking the self.anki_object dict for a diff, but I'm not familiar with Anki's backend so I don't know if there would be unintended consequences.

lstrobel avatar Apr 26 '25 04:04 lstrobel

Thanks very much for the report and for digging into the issue!

To make double sure, the problem is that after you import a large deck via CrowdAnki, the next Anki sync (with AnkiWeb) takes a long time?


I haven't thought about this too deeply and don't have as much time as I'd like to spend on this (hence my verboseness — this is a quick thought dump :) (sorry!)), but the solution you suggest is probably the right one (or at least an important part of one).

Some potential hurdles (most I think are not actual hurdles, but I list them for completeness/so that I know to double check):

  1. Will this interfere with media sync?

    I believe not, since (AFAIR) media sync in Anki is controlled by (first) checking the media/ directory mtime and (if mtime changed) comparing the mtimes or hashes (? don't remember) of all media files with those stored in Anki's database.

    Hence, the note itself is irrelevant, here.

  2. Does Anki frequently (between versions) change the internal representation of the note object?

    This would make comparison trickier (have to check for absence of fields both ways) and result in false positives (timestamp updated when it need not be — not really that much of a problem, unlike false negatives).

    However, I believe that for notes, the parts that we're interested in are (mostly?) stable. (For note models they do seem to change frequently, from what I remember.)

  3. Would this slow imports down?

    Probably slightly but probably by significantly less than the time gained during syncing (though some benchmarking of both (imports and sync, before and after code change) would be valuable).

  4. What happens if the note models change?

    There are two (not mutually exclusive) cases:

    • a) A different note model (as specified by the "note_model_uuid") applies to the note.

      The change is handled by .handle_model_update. AFAICT we could just have it return a boolean to denote whether the model was updated, and combine that with the anki_object diff.

      TODO Check: Are we forcing a full sync when applying a different note model (as I think we should be)? (If yes, then depending on what we figure out for b) it might not be necessary to modify .handle_model_update.) (I _think_ that self.collection.models.change(Anki's built-in method), called byChangeModelDialog`, does force a full sync, but haven't checked.)

    • b) The note model itself is modified.

      This isn't at all tracked by Note.save_to_collection and is (logically) instead handled by NoteModel.save_to_collection. Changes to the note model I believe force a full sync, and in that case the timestamp of notes is probably irrelevant (everything is uploaded anyway). Ideally we'd check whether Anki updates the mod of notes whose note model was changed via Anki's normal methods in case there are unforeseen side-effects.

  5. Is moving a note between decks affected?

    Unless one checks the relevant box in the import config (ignore_deck_movement), cards are currently moved to the imported deck (in case they were elsewhere).

    This is handled by .move_cards_to_deck and always runs (irrespective of whether the card actually needs to be moved or not).

    We'd need to check what's the effect (in terms of db changes and syncing) of card.flush when the note isn't updated/flushed. Depending on this, we might: a) not have to do anything extra, b) modify card.move_to_deck to be more intelligent, so we only flush when the did changes, c) do b and also pass the info back up to the note (and run note.flush if any of the cards had to be moved) or d) something else.

aplaice avatar Apr 26 '25 19:04 aplaice

Thanks for the reply! Indeed, the problem is that after I import a large deck via CrowdAnki, the next Anki sync (with AnkiWeb) takes a long time.

Given your thoughts, this PR seems pretty doable! When I get some free time I'll take a look.

lstrobel avatar May 01 '25 11:05 lstrobel

Working on this issue right now. Discovered that the full sync is not caused by self.anki_object.mod = anki.utils.int_time(), but seems instead to only happen when move_cards_to_deck is called.

Seeing if instead we can prevent unnecessary moves.

lstrobel avatar Jul 06 '25 06:07 lstrobel