Database changes
When i change the default algos on -update for the first time, does it keep those settings for future -update, or do I need to specify each time?
I recently made the mistake of using only dct for some new images I added to the directory. I then tried a new run of update with dct+fdct, but it did nothing. Can you add a way to update the database with new algos?
These are the cases I am thinking and the outcomes I would expect (you may have covered some of these).
- I choose only some algos on first index. Subsequent update commands should use the same settings unless otherwise specified.
- I specify the algos I want to use It should check the hashes (or whatever) for each file to see if all algos are present. if not, it should add the algos for the files that are missing. This would require checking algos before file existence so that files are not skipped when found on the system (will unfortunately slow down scanning)
- I want to remove some algos from the index to prune the size. I would love some options to see what algos are in use for the index currently, and to remove some algos from the index
Also, slightly related. I have some issues with graphical programs started from the command line ICE authority because of Whonix. When it crashes a lot has to be done again. It would be awesome if the matches could be saved to disk while it is processing and able to pick up where it left off. This is especially pertinent for template matching which takes the better part of a day on my lousy system.
I would love to hear your thoughts on this and/or any ideas it might have sparked.
I ran -list-index-params on the pictures index and it listed all algos. This means that subsequent calls of update without specifying algos does indeed use them all. As of right now, I guess I will need to start over to get only the algos I want for all images. Once I have a good command with the options I want, I will copy it somewhere to use for every update.
So that gives some more information for the initial question anyway.
My overall thought is, most of the time you just enable all algos because the space/time savings are not significant; the only niche use cases are:
- You have well over 100,000 images to scan (enable DCT only)
- You don't want to index videos
With that said, I think a few "simple" changes would cover your use cases.
- Add
-syncor something that works exactly like-update, additionally adding/removing algos as specified. - If any file is not indexed for the query algo, then the soft warning could add "please use -i.algos dct+fdct -sync"
With that said, I think a few "simple" changes would cover your use cases.
- Add
-syncor something that works exactly like-update, additionally adding/removing algos as specified.- If any file is not indexed for the query algo, then the soft warning could add "please use -i.algos dct+fdct -sync"
That should work great.
I have added -i.sync which is enabled by default, finalized in 2e587bee84b6756a5af56139d78138d7d6464ff8. When -i.algos changes, it only allows adding of new algos, the old ones cannot be broken by adding a different one. I haven't taken it to the point of removing algos yet but the frameowork is mostly in place to do that now.
There is also a warning message for -similar if the requested algo isn't indexed.
I consider the issue resolved unless you think removal is important; personally I don't see the problem as the storage requirements are pretty low.