Opening, saving and closing a database over and over increases its size when nothing is changing
Version and Platform (required):
- Binary Ninja Version: 5.3.8652-dev Ultimate (f6637a1a)
- Edition: Ultimate
- OS: macOS
- OS Version: 15.6
- CPU Architecture: M1
Bug Description: Saving, closing and re-opening a database, without making any changes between opening and closing, cause the database file to increase in size. The amount it increases by might be dependent on the initial size of the database as I have noticed when a database is quite large (I have plugins that save large amounts of data in the binary view metadata) the amount it increases by increases alot as well. I'm finding a database for the DYLD Shared Cache after initial analysis is around 18MB and can increase by around 0.5MB each save.
Steps To Reproduce: In my case I tested this on a copy of the DYLD Shared Cache.
- Open the DYLD Shared Cache in Binary Ninja.
- Wait for analysis to complete.
- Save a database.
- Close the binary view.
- Re-open the binary view.
- Repeat steps 3-5 multiple times and observe the database increasing in size. It does not seem to occur everytime.
Expected Behavior: I wouldn't expect the database to change size when no changes are made.
Additional Information: In some cases I've seen a jump from 14MB to 20MB, where 14MB is the result of the first database save and 20MB is the second save.
This might be related to the fact that in the case of a DYLD Shared Cache database I notice that after opening it, it has unsaved changes even if I haven't done anything after the load completed. So maybe somehow there are pendings changes, plus snapshotting which might explain the amount the database increases by, increases with the database size?
Looks like the auto types and tags are being modified every save. Tags changing is likely #7563 but types changing is unusual. Might be something about the DSC view creating new types every time?
Upon further inspection, it seems like the types and tags are not actually being modified, just the serialized list of them in the database is not sorted. Likely due to hash maps, they are stored in a different order in the serialized form, leading to the fields not de-duplicating every save.