Advertisement: Linux VPS from $4/month - contact support for custom offer.
+ Post New Thread
Results 1 to 1 of 1

Thread: Linux utility for local reverse image search to find duplicity or similar images

  1. #1
    Administrator
    Join Date
    Mar 2013
    Posts
    2,802

    Linux utility for local reverse image search to find duplicity or similar images

    1 - dupeGuru

    dupeGuru is a GUI tool to find duplicate files in a system.

    Installation on Debian Linux (one-liner):
    cd /tmp;sudo apt install python3-mutagen python3-semantic-version && wget https://github.com/arsenetar/dupeguru/releases/download/4.3.1/dupeguru_4.3.1_amd64.deb -q;sudo dpkg -i dupeguru_4.3.1_amd64.deb;cd -

    Once launched, you need to add folder/s to search (+ button), scan mode "Image", type "Content". More options, select filter to be 100% not 95% if you want to really find same or very similar images. I have also selected to ignore duplicities in the form of hardlinks. In the View/Display menu, you can show exclusion filters and to exclude images in path which contains folder /can-be-duplicate/, add filter/reg.expression "./can-be-duplicate/." (without quotation marks). If added correctly, and you have pasted test path and clicked test string button, it should highlight appropriate reg. expression.

    2 - findimagedupes

    findimagedupes can help find similar or same images to defined image or just all similarities/duplicates inside a folder (non)recursively. Unfortunately this tool is for experienced CLI users who can use sed, awk or similar to adjust utility output

    sudo apt install findimagedupes # installation on Debian/Ubuntu
    findimagedupes # help

    BUILDING index of images (on a SSD with 23 GB of photos it took like 1 hour):

    findimagedupes -R -q -f $HOME/findimagedupes.index --prune -n -- "/images-folder/"
    (to not go recursively into subdirectories, remove " -R", do the same for the following commands too)
    (to index multiple folders, just add a space after first folder and add next folder name)

    IMAGE VS INDEX - Duplicates of a single image inside the already built index:

    findimagedupes -q -f $HOME/findimagedupes.index -a -- "/path/to/image.jpg"
    (to show less similar images, replace "-a" by "-t 70% -a")

    FOLDER VS INDEX - Duplicates between a defined folder and already built index (not showing duplicates that are only inside index, thanks to -a switch):

    findimagedupes -R -q -f $HOME/findimagedupes.index -t 100% -a -- /images-folder-2/
    (remove "-R " to avoid recursive search in defined folder)

    FOLDER VS INDEX & WITHIN INDEX - ALL - Duplicates between a defined folder and already built index:
    findimagedupes -R -q -f $HOME/findimagedupes.index -t 100% -- /images-folder-2/
    (decrease -t switch value to show less similar)

    WITHIN INDEX - Duplicates within already built index:

    findimagedupes -q -f $HOME/findimagedupes.index -t 100%
    After renaming, removing, adding files to your images folder, you need to run first index building command again, in order to keep the index up to date. On repeated execution, it is completed much faster (in seconds or a few minutes).

    After finding duplicates, you may execute some command on these (open, move or remove these), i did it by adding -p or -i switch before the -a switch like this:


    run all duplicates in a program:
    -p "$(type -p xnview)"

    run duplicates in a program echo (so it just prints the paths):
    -p "$(type -p echo)"

    echo 1st of the duplicates and open the rest in a XnView viewer ("shift" should ensure not all but remaining be displayed):
    -i 'VIEW(){ echo "rm $1"; shift; xnview "$@"; }'

    remove first of the duplicate files:
    -i 'VIEW(){ echo "Removing first of the duplicates."; rm "$1"; }'
    (you may remove duplicate with this, but there is not a guarantee that the folder from which you want to remove it, will be listed on the first place ($1), it may happen it will remove same duplicate but inside different folder. Also use -t 100% not to remove just remotely similar images!)


    remove first of the duplicate files in case it is inside defined path:
    -i 'VIEW(){ if echo "$1" | grep -q "$folderwithduplicates"; then echo "Removing duplicate from a folder folderwithduplicates."; rm "$1"; fi }'
    Command removes the file in case the 1st listed duplicate is inside folder which path contains the text defined under variable $folderwithduplicates. This variable needs to be defined before executing findimagedupes for example like this: folderwithduplicates="/images-folder-2/"



    report that there is a duplicate file and place 1st of the duplicates into a certain folder (created by you before):
    -i 'VIEW(){ echo "Similar file found. Check _duplicates_to_delete folder."; mv "$1" "/_duplicates_to_delete/" 2>/dev/null; }'
    NOTES:
    To supply more than one folder into a searching command, replace folder path by "-" and prepend following to the findimagedupes command: "ls -A1 "/images-folder-2/"* | " (asterisk may be necessary)
    Last edited by Fli; 06-19-2024 at 06:24 PM.

+ Post New Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
 Protected by : ZB BLOCK  &  StopForumSpam