cbird icon indicating copy to clipboard operation
cbird copied to clipboard

Save paths to temp file / moveToTrash requested path is not a file

Open chapmanjacobd opened this issue 2 years ago • 2 comments

I'm not sure how this is possible (because of the unique index media_path_index) but it seems like cbird tried to delete something that it deleted just one line before

$cbird -p.dht 1 -similar -select-result -sort-rev resolution -chop -nuke
[D][Database::Database] "."                                                                                                                                                                                                       
[D][Database::connect] thread:0x5637468b13b0 sqlite_0_0 ~/.local/cbird/art/_index/media0.db                                                                                                                                       
[D][Database::connect] thread:0x5637468b13b0 sqlite_1_1 ~/.local/cbird/art/_index/media1.db                                                                                                                                       
[D][Database::connect] thread:0x5637468b13b0 sqlite_2_2 ~/.local/cbird/art/_index/media2.db                                                                                                                                       
[D][Database::connect] thread:0x5637468b13b0 sqlite_3_3 ~/.local/cbird/art/_index/media3.db                                                                                                                                       
[I][Database::fillMediaGroup] sql query: 1400001                                                                                                                                                                                  
[D][Database::similar] loading index for algo 0                                                                                                                                                                                   
[I][DctHashIndex::load] sql query: 96% 1,399,808 hashes                                                                                                                                                                           
[I][DctHashIndex::load] 1400443 hashes, 426ms                                                                                                                                                                                     
[I][Database::similar] index loaded in 4105ms                                                                                                                                                                                     
[I][Database::similar]  41000 1400443                                                                                                                                                                                             
[W][DctHashIndex::find] no hash for needle: "~/.local/cbird/art/91_New_Art/emoji-kitchen/u1f429_u1f90d.jpg"    
...
[I][Database::similar]  1346000 1400443                                                                                                                                                                                           
[W][DctHashIndex::find] no hash for needle: "~/.local/cbird/art/95_Memes/stars.jpg"                                                                                                                                               
[I][Database::similar]  1400000 1400443                                                                                                                                                                                           
[I][Database::similar] searched 1400443 items and found 1400443 matches in 488ms                                                                                                                                                  
[D][Database::similar] filter matches                                                                                                                                                                                             
[D][Database::loadWeeds] loaded 0 weeds                                                                                                                                                                                           
[I][Database::similar] filtered 1400443 matches to 28487 in 1427ms                                                                                                                                                                
                                                                                                                                                                                                                                  
nuke: about to move 69923 items to trash, proceed? [y/N]: y    
...
[I][DesktopHelper::runProgram] QList("trash-put", "~/.local/cbird/art/91_New_Art/jonathanmccabe/20262979831_20262979831_54b4f72f6f_o.jpg")
[D][DesktopHelper::runProgram] portable PATH: "/tmp/.mount_cbirdrlRyyk/cbird/bin/" LD_LIBRARY_PATH: "/tmp/.mount_cbirdrlRyyk/cbird/lib/"
[I][DesktopHelper::runProgram] QList("trash-put", "~/.local/cbird/art/91_New_Art/jonathanmccabe/19634149424_19634149424_27dd7ce56c_o.jpg")
[D][DesktopHelper::runProgram] portable PATH: "/tmp/.mount_cbirdrlRyyk/cbird/bin/" LD_LIBRARY_PATH: "/tmp/.mount_cbirdrlRyyk/cbird/lib/"
[W][DesktopHelper::moveToTrash] requested path is not a file: "~/.local/cbird/art/91_New_Art/jonathanmccabe/20262979831_20262979831_54b4f72f6f_o.jpg"

# exit 0

I'm not sure how to reproduce this bug but it has only happened once and I'm not too concerned about it but:

  1. cbird exits with code 0 after this. The exit code should probably be non-zero ?
  2. it would be convenient to have an option to save the selected paths to a random or named NUL-delimited file

my short term workaround:

cbird -p.dht 1 -similar -select-result -sort-rev resolution -chop -dump > out
cat out | grep path | sed 's|.*  = ||' | sed 's|~/.local/cbird/art/||' | string unescape | parallel -j20 rm {} 

chapmanjacobd avatar Jul 25 '23 03:07 chapmanjacobd

also, instead of exiting when this happens you might consider doing what rmlint does: check that the "original" exists before each delete/trash operation:

  • if the "original" exists, ignore ENOENT / ENOTDIR when trying to delete the duplicate(s), print a warning, but don't exit the program
  • if the "original" doesn't exist, skip any operations on the duplicate(s), print a warning, but don't exit the program

however, I don't think cbird currently is checking that the "original" exists before deleting so the current behavior of exiting the whole program makes sense to me

chapmanjacobd avatar Jul 25 '23 03:07 chapmanjacobd

There would have to be two groups that contain the same thing, e.g. A=>C, B=>C so with the chop you get { C, C } in the list of deletions. This would mean that A,C and B,C are closer to each other than A,B. These could be a false matches especially if we know that A,B are unique.

Exit code is -1 for -nuke, I'm not sure where the 0 status is coming from.

Yeah, there is no concept of "original" anywhere really, there is only the "needle". Once you get to the "nuke" phase which is more of a "trust me bro" option for cases where you have some known duplicates to discard.

Ideally nuke would never exit early unless there was a problem it couldn't otherwise detect or resolve.

I am moving towards something where we can reliably batch-delete duplicates. Based on your comments it seems clear that the system would need:

  1. some concept of "original"
    • some way to elect which one should be the original (digikam has this)
    • don't delete originals with -nuke, or dups if the original went missing
  2. if something seems off like copies in the list, put up an "are you sure ...?" prompt
  3. add a flag to "just say yes" to all warnings that would stop -nuke from finishing
  4. add another version of -nuke that would take originals into account

scrubbbbs avatar Jul 25 '23 18:07 scrubbbbs