Add a remote command for batch duplicate finding.
Here is my first stab at it. I know I at least need to adjust for the accepted coding style, but please let me know if there are any other major things that need changing.
Based on https://github.com/porridge/image-duplicate-finder
Closes: #1520
In the latest commit, the command line handling has changed. The attached .diff are the changes I think are necessary to conform with the new code:
Revised .diff file. Please note that these changes are merely suggestions.
You can run the project static tests locally before making a pull request by: ./scripts/test-all.sh
Thank you for the reviews @xsdg and @qarkai and the rebase @caclark!
Here's a new iteration, where I also removed the need to run in "remote" mode.
I think I addressed all the comments, apart from one, about using fork. As far as I could tell, the g_spawn_async_with_pipes function has two problems with my use case:
- according to its own documentation, it does not handle whitespaces in arguments well, and this is critical for reliable passing of file paths,
- it is not possible to use it in a way which just inherits stdin and stdout, which is necessary when the program run on each duplicate set communicates with the user through the terminal.
Example usage:
$ ./build/src/geeqie -p ?.png
/home/porridge/Pulpit/coding/geeqie/2.png /home/porridge/Pulpit/coding/geeqie/3.png
$
and with debug output:
$ ./build/src/geeqie --debug 1 ?.png
duplicates program set to "echo"
processing 3 files in set
/home/porridge/Pulpit/coding/geeqie/1.png vs /home/porridge/Pulpit/coding/geeqie/2.png: 91,749387
/home/porridge/Pulpit/coding/geeqie/1.png vs /home/porridge/Pulpit/coding/geeqie/3.png: 91,701134
/home/porridge/Pulpit/coding/geeqie/2.png vs /home/porridge/Pulpit/coding/geeqie/3.png: 99,773284
/home/porridge/Pulpit/coding/geeqie/2.png /home/porridge/Pulpit/coding/geeqie/3.png
@porridge
I suggest you consider changing option --process-duplicates to --duplicates-process. This will group the three new commands together in the help output, and will be more logical when using command line completion. The short form option may be a bit illogical, but I do not think that matters.
There should be entries for the three command line long options in ./auto-complete/geeqie. This is for bash command line completion.
If you wish the --duplicates-program to be remembered between sessions, there should be entries at about lines 409 and 915 of ./src/rcfile.cc.
I have some other comments but it will be a few days before I write them down.
@porridge I am sorry that it has taken me so long to think over this feature
@ anyone else For comment
I think there may be another way to solve this problem. It would involve some new features that may or may not be possible.
The adding of files to the Dupes Window is a bit messy. Either by drag-and-drop or right click on files or directory - but this does not work as I expect at the moment (the right-click feature opens a new Dupes Window each time, which I do not think is correct).
-
It may be possible to include a right-click add files option to the Dupes Window
-
It may be possible to add files to the Dupes Window via the command line e.g.
geeqie --dupes-window-add <list of files>followed by the Dupes Window being opened automatically if not already open.
When the Dupes Window is open the user has the possibility to change the comparison type, the image rotation mode and other options. When the dupes check is completed, the user has the choice of which data to select.
The selected data may be Exported to a comma-separated or tab-separated file via a right-click in the Dupes Window.
-
It may be possible to send the same selected data to the command line e.g.
geeqie --dupes-export | cut --fields=5 -
For the above command to be part of an automate-able sequence, it would be necessary to know when the dupes operation has finished. I have no idea at the moment.
The plugins are available via a right-click in the Dupes window. With that pretty much anything can be achieved, and might make --dupes-export redundant. Unfortunately plugin keyboard shortcuts are not recognized.
-
It may be possible for the Dupes Window to recognize plugin keyboard shortcuts.
-
It may be possible for there to be a
When a dupes run is finished, call this pluginfeature - but that could be dangerous for an unwary user trying to delete unwanted files. I am not enthusiastic about this idea, but it is one way of knowing when the dupes run is completed (it would run whenever the user changes the comparison mode, for instance, and not just when the user has made the final decision).
@porridge
I suggest you consider changing option
--process-duplicatesto--duplicates-process. This will group the three new commands together in the help output, and will be more logical when using command line completion. The short form option may be a bit illogical, but I do not think that matters.
Done, @caclark
There should be entries for the three command line long options in
./auto-complete/geeqie. This is for bash command line completion.
I believe they are there already in the options variable definition? Or do you have something else in mind?
If you wish the
--duplicates-programto be remembered between sessions, there should be entries at about lines 409 and 915 of./src/rcfile.cc.
I think it's better to require the user to explicitly provide the program, in case it is destructive and the user forgot what it was set to last.
I think there may be another way to solve this problem. It would involve some new features that may or may not be possible.
@caclark, sounds like what you are suggesting would be more discoverable for a typical GUI user. OTOH it might be less convenient for use in scripted, automated workflows.
But most importantly from my perspective - I don't feel anywhere as competent as would be required to implement what you described 😅
@porridge Attached is a .diff from the current sources. I would appreciate it if you would take a look at it. This code is not a solution - it is just a hack to demonstrate a different way of achieving this feature.
After compiling, from a terminal window run ./build/src/geeqie
Open the dupes window and select Compare By to Similarity Custom. Set Custom Threshold to a low number e.g. 50
Close the dupes window.
From another terminal window run ./build/src/geeqie --duplicates-process <list of files.>
For <list of files> just use 4 or 5 simple jpegs.
Run ./build/src/geeqie --duplicates-export
Then try ./build/src/geeqie --duplicates-export | tail -n +2 | cut -f 2,5
The advantages I see are: No additional comparison logic is required No specific processing program is required It is easier for users to process the data in the way they wish
If this is an acceptable solution, the duplicates-export function will be eliminated - the option duplicates-process ... will output the required text (there is a timing problem I did not yet solve).
1524-1.diff.gz
@caclark
After compiling, from a terminal window run
./build/src/geeqie[...] From another terminal window run./build/src/geeqie --duplicates-process <list of files.>For<list of files>just use 4 or 5 simple jpegs.
It would be hard to incorporate this kind of usage (run a GUI-dependant geeqie session and while it's running, launch another process) into my workflow. Not impossible, but the changing window focus and juggling a process running in the background are tricky to handle.
FTR my use case is an automated pipeline that:
- moves images from input directories (to which they are first synced using syncthing from various devices) into a staging location,
- de-duplicates them
- auto rotates and performs various other metadata fixups
- splits them into year/month/day based directory structure.
Run
./build/src/geeqie --duplicates-exportThen try./build/src/geeqie --duplicates-export | tail -n +2 | cut -f 2,5
Also, simple text format makes it hard to reliably handle filenames which in principle can contain arbitrary whitespace characters. This could be done in a somewhat standard way using JSON format for example, but even then the need to post-process that iin order to pass further down the pipeline is yet another hurdle. This is why I found the feature of running a separate user specified command on each set of duplicates in turn particularly attractive.
The advantages I see are: No additional comparison logic is required
Yes, that would indeed be nice. While it was super easy to reuse the algorithm for comparing two images, when I tried to reuse the logic that geeqie uses internally to process a whole list of files I got completely lost in the sources 😅
@porridge OK Cache maintenance runs Geeqie in non-gui mode. I will use that as an example and try to create something better.
@porridge
Does the --dupes-export option allow you to do the actions you want?