TagStudio
TagStudio copied to clipboard
[Bug]: Library sometimes deleted when application closed
Checklist
- [X] I am using an up-to-date version.
- [X] I have read the documentation.
- [X] I have searched existing issues.
TagStudio Version
Alpha v9.3.2
Operating System & Version
Windows 10
Description
Occasionally when closing Tag Studio, via the X button, it will wipe the ts_library.json file, leaving it with zero bytes, despite the logs claiming the library was saved.
I spent some time fiddling around with the code and stumbled upon a potential solution:
Move the call to self.to_json() outside of the with open( block in the save_library_to_disk function.
Example
# If `to_json` is called in the `with` block, the file will sometimes fail to write
jsonLibrary = self.to_json()
with open(
self.library_dir / TS_FOLDER_NAME / filename, "w", encoding="utf-8"
) as outfile:
outfile.flush()
ujson.dump(
jsonLibrary,
outfile,
ensure_ascii=False,
escape_forward_slashes=False,
)
Note that having jsonLibrary = self.to_json() inside the with open( block did not fix the issue.
I stumbled into this when I was logging the size of the jsonLibrary to see if it was actually being populated, and comparing the sizes at different places in the function.
(It was consistently the same size, regardless of if it wrote the bytes to disk or not.)
With that change in place I was able to go through over a dozen open/close cycles and it saves the library each time. Where previously it would fail to save by the time I got to close number 5
I can't pretend to understand why this seems to have solved the issue, so I can not guarantee the robustness of the proposed fix.
Other Things I Tried
- Moving
outfile.flush()after theujson.dumpcall - Adding
outfile.close()after theujson.dumpcall
Work Around
During my testing I discovered if I manually save the library then immediately close the app, the bug never seemed to happen.
Additional Info
- My library
- is stored on an internal SSD
- is not stored on the C:\ disk
- has a little over 4K files in it
- json file is ~340K
- This may be related to #320
Expected Behavior
Data should be written to the library file on exit
Steps to Reproduce
- Open Tag Studio
- Create a new library
- Manually save
- Note the size of the save file (mine is ~340KB)
- Close Tag Studio
- Open Tag Studio
- Wait for it to load the library
- Close Tag Studio
- Check the size of the save file
- If the save file is not 0 bytes, repeat steps 6-9
Logs
Log output when it fails to save
Saving Library...
[LIBRARY] Saving Library to Disk...
[LIBRARY] Formatting Tags to JSON...
[LIBRARY] Formatting Entries to JSON...
[LIBRARY] Formatting Collations to JSON...
[LIBRARY] Done Formatting to JSON!
[LIBRARY] Library saved to disk in 0.008 seconds
[SHUTDOWN] Ending Thumbnail Threads...
Log output when it successfully saves
Saving Library...
[LIBRARY] Saving Library to Disk...
[LIBRARY] Formatting Tags to JSON...
[LIBRARY] Formatting Entries to JSON...
[LIBRARY] Formatting Collations to JSON...
[LIBRARY] Done Formatting to JSON!
[LIBRARY] Library saved to disk in 0.031 seconds
[SHUTDOWN] Ending Thumbnail Threads...
I've continued to try and track down the root cause of the issue, and it seems like it is a timing issue. Potentially one outside of the app.
Starting with the original code, I added a print for the file path. Then if I print the size of the file before and after a call to sleep, they disagree with one another
jsonPath = self.library_dir / TS_FOLDER_NAME / filename
print(f"[LIBRARY] Save path: {jsonPath}")
with open(
jsonPath, "w", encoding="utf-8"
) as outfile:
outfile.flush()
ujson.dump(
self.to_json(),
outfile,
ensure_ascii=False,
escape_forward_slashes=False,
)
# , indent=4 <-- How to prettyprint dump
print(f"[LIBRARY] Pre-sleep Size: {jsonPath.stat().st_size} bytes")
time.sleep(0.1)
print(f"[LIBRARY] Post-sleep Size: {jsonPath.stat().st_size} bytes")
On my 3rd close, the bug happened:
[LIBRARY] Saving Library to Disk..
[LIBRARY] Save path: E:\Screenshots\.TagStudio\ts_library.json
[LIBRARY] Formatting Tags to JSON...
[LIBRARY] Formatting Entries to JSON...
[LIBRARY] Formatting Collations to JSON...
[LIBRARY] Done Formatting to JSON!
[LIBRARY] Pre-sleep Size: 348937 bytes
[LIBRARY] Post-sleep Size: 0 bytes
[LIBRARY] Library saved to disk in 0.140 seconds
Then I added a call to sleep just before ujson.dump, and the bug magically goes away.
with open(
jsonPath, "w", encoding="utf-8"
) as outfile:
outfile.flush()
time.sleep(0.1)
ujson.dump(
self.to_json(),
outfile,
ensure_ascii=False,
escape_forward_slashes=False,
)
# , indent=4 <-- How to prettyprint dump
print(f"[LIBRARY] Pre-sleep Size: {jsonPath.stat().st_size} bytes")
time.sleep(0.1)
print(f"[LIBRARY] Post-sleep Size: {jsonPath.stat().st_size} bytes")
[LIBRARY] Saving Library to Disk..
[LIBRARY] Save path: E:\Screenshots\.TagStudio\ts_library.json
[LIBRARY] Formatting Tags to JSON...
[LIBRARY] Formatting Entries to JSON...
[LIBRARY] Formatting Collations to JSON...
[LIBRARY] Done Formatting to JSON!
[LIBRARY] Pre-sleep Size: 348937 bytes
[LIBRARY] Post-sleep Size: 348937 bytes
[LIBRARY] Library saved to disk in 0.227 seconds
Ran this test 10 times, and no bug.
As a final set of tests I went back to the original code with just the sleep inside the with-block
with open(
self.library_dir / TS_FOLDER_NAME / filename, "w", encoding="utf-8"
) as outfile:
outfile.flush()
time.sleep(0.1)
ujson.dump(
self.to_json(),
outfile,
ensure_ascii=False,
escape_forward_slashes=False,
)
# , indent=4 <-- How to prettyprint dump
Ran this test 10 times, and no bug.
So there seems to be two independent "fixes" for the bug
- Move
self.to_json()outside of the with-block - Add
time.sleep(0.1)just beforeujson.dump
I can't wrap my head around what is going on here. Option 2 would suggest there isn't enough time between opening the file and writing to the file. Why would there need to be an artificial wait added between open and write?
If I drastically increase the sleep time for Option 2 to several seconds I can see that when the file is opened for writing the existing contents are, expectedly, cleared with a file size of 0, the data is saved, and the file size goes to ~340K
But none of that explains why Option 1 would have any effect at all, and yet I can't repro the bug when self.to_json() is outside of the with-block.
If the source of the bug really is some kind of race condition, then I'm starting to wonder if a true fix for this would be to save a new library file, delete the old library file, then rename the new file.
I ran with the temp file idea and have not run into the bug again
tempJsonPath = self.library_dir / TS_FOLDER_NAME / "temp.json"
jsonPath = self.library_dir / TS_FOLDER_NAME / filename
# Delete temp file if exists
tempJsonPath.unlink(missing_ok=True)
# Write to a temp file
with open(
tempJsonPath, "w", encoding="utf-8"
) as outfile:
ujson.dump(
self.to_json(),
outfile,
ensure_ascii=False,
escape_forward_slashes=False,
)
# Delete existing libary
jsonPath.unlink(missing_ok=True)
# Rename temp file
tempJsonPath.rename(jsonPath)
In my opinion, of the three solutions, this one feels like the winner.
Pros
- No weird calls to sleep
- No worries about someone accidentally "optimizing" the code by getting rid of a variable and moving
self.to_json()back inside the with-block - If something goes wrong, there is data saved somewhere.
- If writing to the temp file fails or is interrupted by a crash or power cut, the original
ts_library.jsonfile is left untouched - If deleting
ts_library.jsonfails,temp.jsonhas the latest data - If renaming fails,
temp.jsonstill exists
- If writing to the temp file fails or is interrupted by a crash or power cut, the original
- The
open_librarycode could be expanded to check for thetemp.jsonand attempt to load it first- If
temp.jsonfails to read/parse, fall back onts_library.json - Could potentially check timestamps
- If
Cons
- Will briefly require twice as much disk space
Just a minor note (which you will probably already think of yourself: there will at some point be a user that decides to name their database temp, at which point you will have a name collision. Perhaps the name would be a better choice? Still not guaranteed to be unique, but if you want a stable name, you need to stop at some point.
jsonPath = self.library_dir / TS_FOLDER_NAME / filename+".temp.json"
Or compute a hash of the file name and use that (+".json") as the temp file name.
Also, looking at the code, if the application crashes between the "unlink" and the "rename" call, there would again be data loss (unless, as suggested, code will be added to recover from the temp file in that case).
Maybe instead of unlink and rename (which first deletes the database and then renames the temp one to the real database name) it'd be better to use os.replace.
That way you could call os.replace(temp_db, actual_db) without first deleting the actual database file, and the OS will guarantee (as of Python 3.3) that the move will either fail entirely (keeping the old DB) or succeed entirely (and you'll have the new file saved). That means that there's no need for extra code to recover from a temp file - the database will always be there no matter when the process was interrupted.
Also, at least to me it looks like the code currently in save_library_to_disk could fail on any OS, not just Windows. If the "open" call succeeds (and truncates the existing file) but then the json dump calls fail for whatever reason, the file will probably be corrupted.
Should be fixed as of #554, coming with the next 9.4.x patch. If the issue persists, I can reopen this issue.