immich-go icon indicating copy to clipboard operation
immich-go copied to clipboard

Locally merging metadata without upload?

Open lcs-crr opened this issue 11 months ago • 8 comments

Would it be possible to run the tool locally without the upload function, i.e. using it as a metadata merger only? I want to avoid uploading images that failed the merging (no metadata available, not merged properly, ...) and having to clean up later on.

lcs-crr avatar Apr 03 '25 13:04 lcs-crr

I'm thinking to something that could approach this need: read local files with sidecar files, update immich's metadata for existing images.

simulot avatar Apr 03 '25 16:04 simulot

That feature would be awesome! I uploaded a lot of takeout images a longer time ago to immich, without correcting all of their metadata. As I still have the archives I would love to just run immich-go over it and update the metadata of all the remote images on immich (same or better asset). I am unfamiliar with go and your project, but if you could point me into the right direction in the project I would love to help!

EinToni avatar Apr 04 '25 14:04 EinToni

I've got a similar issue too. My parent tried switching from Android to iOS, but Apple's importer butchered all the dates, and when I imported the images to Immich, now half the photos are all saying they're from the day the importer ran. I was hoping to kind of overwrite the photos with the correct metadata from the original phone so it can merge better.

PseudoResonance avatar Apr 05 '25 02:04 PseudoResonance

I recently did something similar to this due to issue #846.

  1. I had a list of images I wanted to fix metadata in immich
  2. Fixed the EXIF metdata of images from a google takeout with this tool: https://github.com/garzj/google-photos-migrate
  3. Found asset ID's of all images in immich I wanted to fix (used a script)
  4. Deleted them (used a script)
  5. Re-uploaded the images with correct EXIF metadata

Scripts I created and used:

get-asset-ids-from-filename-list.sh:

#!/bin/bash
# get_asset_ids.sh
# This script reads filenames from a file list,
# queries the search/metadata API for each filename,
# and writes a CSV file (asset_ids.csv) with columns: filename, asset_id.

API_KEY="xxx"
IMMICH_URL="https://xx.xx.xx"
FILE_LIST="/home/immich/images_to_delete.txt"
CSV_FILE="asset_ids.csv"

# Write CSV header
echo "filename,asset_id" > "$CSV_FILE"

echo "Retrieving asset IDs..."
while IFS= read -r file || [ -n "$file" ]; do
    echo "Processing file: $file"
    response=$(curl -s -L -X POST "${IMMICH_URL}/api/search/metadata" \
      -H "Accept: application/json" \
      -H "Content-Type: application/json" \
      -H "x-api-key: ${API_KEY}" \
      -d "{\"originalFileName\": \"${file}\"}")

    # Extract the asset ID from the first matching asset
    asset_id=$(echo "$response" | jq -r '.assets.items[0].id')

    if [ "$asset_id" != "null" ] && [ -n "$asset_id" ]; then
        echo "$file,$asset_id" >> "$CSV_FILE"
        echo "Found: $file -> $asset_id"
    else
        echo "$file," >> "$CSV_FILE"
        echo "WARNING: No asset found for $file"
    fi
done < "$FILE_LIST"

echo "Asset IDs saved to $CSV_FILE"

delete_assets.sh:

#!/bin/bash
# delete_assets_loop.sh
# This script reads asset_ids.csv (with columns: filename,asset_id),
# then loops through each entry and prompts you before sending a DELETE request
# for each asset.

API_KEY="xxx"
IMMICH_URL="https://xx.xx.xx"
CSV_FILE="asset_ids.csv"

# Read CSV file into an array, skipping the header.
readarray -t lines < "$CSV_FILE"

for (( i=1; i<${#lines[@]}; i++ )); do
    line="${lines[$i]}"
    # Extract filename and asset id using awk (assuming comma-delimited CSV)
    filename=$(echo "$line" | awk -F',' '{print $1}')
    asset_id=$(echo "$line" | awk -F',' '{print $2}')

    if [ -z "$asset_id" ]; then
        echo "Skipping $filename - no asset id found"
        continue
    fi

    echo "Asset: $filename -> $asset_id"
    #read -p "Do you want to delete this asset? (Y/n): " confirm
    if [[ "$confirm" == "Y" || "$confirm" == "y" || -z "$confirm" ]]; then
        payload="{\"force\": true, \"ids\": [\"$asset_id\"]}"
        echo "Deleting asset $asset_id..."
        curl -s -L -X DELETE "${IMMICH_URL}/api/assets" \
            -H "Content-Type: application/json" \
            -H "x-api-key: ${API_KEY}" \
            -d "$payload"
        echo "Asset deleted."
    else
        echo "Skipping asset $asset_id."
    fi
    echo ""
done

hakong avatar Apr 05 '25 10:04 hakong

I have a related issue/request.

My uploads don't fail, but being able to merge the meta data would help me work around an issue where the google takeout has the same file in 2 different albums with 2 different descriptions. What happens is that only one of the descriptions ends up being kept.

In my case, I'd like to see the two descriptions merged by appending to each other with a newline in between, but I can imagine some people might want to be able merge in different ways. E.g. keep longest, keep earliest, keep latest

I also have some cases where files detect as being the same files of different qualities and the higher quality ones is kept, but I instead of keeping only the metadata from the higher quality one, I'd prefer to merge the metadata. E.g. keep any geo references if only one has it, keep the earliest timestamp since that's most likely when the photo was taken with later dates being edits.

briandking avatar May 05 '25 01:05 briandking

My uploads don't fail, but being able to merge the meta data would help me work around an issue where the google takeout has the same file in 2 different albums with 2 different descriptions. What happens is that only one of the descriptions ends up being kept.

In my case, I'd like to see the two descriptions merged by appending to each other with a newline in between, but I can imagine some people might want to be able merge in different ways. E.g. keep longest, keep earliest, keep latest

This would increase the complexity of the import for a nice case... Can't you fix the problem in Google Photos and re-export?

I also have some cases where files detect as being the same files of different qualities and the higher quality ones is kept, but I instead of keeping only the metadata from the higher quality one, I'd prefer to merge the metadata. E.g. keep any geo references if only one has it, keep the earliest timestamp since that's most likely when the photo was taken with later dates being edits.

This would be beneficial for the community. How to handle edge cases? Both versions have geo references, but different? Or different description...

simulot avatar May 05 '25 09:05 simulot

For the descriptions case, I found this issue afterwards which seems to track a similar issue other people are having: https://github.com/simulot/immich-go/issues/462 It is possible to fix it in google photos, but it would be difficult to identify the conflicts, and cumbersome to fix them all once they were identified. In my case there are 30-40k photos and I only noticed this issue once the photos were in immich and some of the descriptions were missing. When I picked one photo to drill down on in the takeout files, and the immich-go upload logs, I found that the descriptions were being replaced by later better uploads, or skipped because they were on lesser/equivalent uploads.

For conflicting geotags, you could attempt to stuff one into the created location and the other into the shown location: https://iptc.org/std/photometadata/specification/IPTC-PhotoMetadata#location-created Or store extra geotags in some other field(s), like appending to the description, or using tags. Immich doesn't currently use any of those alternate places to store locations, but at least the information would be preserved instead of lost. Another alternative would be an option to merge into a single location if they were within a certain distance of each other.

briandking avatar May 05 '25 23:05 briandking

There are some baseline I'd like to keep:

  • Never change the original file. If something should be modified, it must be done using the immich API
  • Keep the process as simple as possible. Over engineered solutions are difficult to explain, and don't fit well with common user needs.
  • Be idempotent. A big work to say that if a process is ran with the same input, it produce the same output. For example the description field should not been touched at each run. Doing so, a run can be stopped, and resumed without side effects.

simulot avatar May 06 '25 07:05 simulot