Prevent update of unchanged notes
Problem
When importing large decks from JSON, I noticed that Anki will sync every note in the deck despite there being no changes. For very large decks (mine has tens of thousands of notes), this can cause a lot of churn.
This appears to be caused by self.anki_object.mod = anki.utils.int_time() in the note saving process, which unconditionally updates the modification timestamp of all notes.
Ask
If a note's fields haven't changed from what's already in the collection, don't update the modification timestamp or flush changes to the database.
I'd be happy to push a PR for this feature, if it really is as simple as checking the self.anki_object dict for a diff, but I'm not familiar with Anki's backend so I don't know if there would be unintended consequences.
Thanks very much for the report and for digging into the issue!
To make double sure, the problem is that after you import a large deck via CrowdAnki, the next Anki sync (with AnkiWeb) takes a long time?
I haven't thought about this too deeply and don't have as much time as I'd like to spend on this (hence my verboseness — this is a quick thought dump :) (sorry!)), but the solution you suggest is probably the right one (or at least an important part of one).
Some potential hurdles (most I think are not actual hurdles, but I list them for completeness/so that I know to double check):
-
Will this interfere with media sync?
I believe not, since (AFAIR) media sync in Anki is controlled by (first) checking the
media/directorymtimeand (ifmtimechanged) comparing the mtimes or hashes (? don't remember) of all media files with those stored in Anki's database.Hence, the note itself is irrelevant, here.
-
Does Anki frequently (between versions) change the internal representation of the note object?
This would make comparison trickier (have to check for absence of fields both ways) and result in false positives (timestamp updated when it need not be — not really that much of a problem, unlike false negatives).
However, I believe that for notes, the parts that we're interested in are (mostly?) stable. (For note models they do seem to change frequently, from what I remember.)
-
Would this slow imports down?
Probably slightly but probably by significantly less than the time gained during syncing (though some benchmarking of both (imports and sync, before and after code change) would be valuable).
-
What happens if the note models change?
There are two (not mutually exclusive) cases:
-
a) A different note model (as specified by the
"note_model_uuid") applies to the note.The change is handled by
.handle_model_update. AFAICT we could just have it return a boolean to denote whether the model was updated, and combine that with theanki_objectdiff.TODO Check: Are we forcing a full sync when applying a different note model (as I think we should be)? (If yes, then depending on what we figure out for b) it might not be necessary to modify
.handle_model_update.) (I _think_ thatself.collection.models.change(Anki's built-in method), called byChangeModelDialog`, does force a full sync, but haven't checked.) -
b) The note model itself is modified.
This isn't at all tracked by
Note.save_to_collectionand is (logically) instead handled byNoteModel.save_to_collection. Changes to the note model I believe force a full sync, and in that case the timestamp of notes is probably irrelevant (everything is uploaded anyway). Ideally we'd check whether Anki updates themodof notes whose note model was changed via Anki's normal methods in case there are unforeseen side-effects.
-
-
Is moving a note between decks affected?
Unless one checks the relevant box in the import config (
ignore_deck_movement), cards are currently moved to the imported deck (in case they were elsewhere).This is handled by
.move_cards_to_deckand always runs (irrespective of whether the card actually needs to be moved or not).We'd need to check what's the effect (in terms of db changes and syncing) of
card.flushwhen the note isn't updated/flushed. Depending on this, we might: a) not have to do anything extra, b) modifycard.move_to_deckto be more intelligent, so we only flush when thedidchanges, c) do b and also pass the info back up to the note (and run note.flush if any of the cards had to be moved) or d) something else.
Thanks for the reply! Indeed, the problem is that after I import a large deck via CrowdAnki, the next Anki sync (with AnkiWeb) takes a long time.
Given your thoughts, this PR seems pretty doable! When I get some free time I'll take a look.
Working on this issue right now. Discovered that the full sync is not caused by self.anki_object.mod = anki.utils.int_time(), but seems instead to only happen when move_cards_to_deck is called.
Seeing if instead we can prevent unnecessary moves.