immich-go icon indicating copy to clipboard operation
immich-go copied to clipboard

Support downloading the takeout files directly, instead of providing a ZIP file

Open tomerh2001 opened this issue 11 months ago • 5 comments

My biggest issue with takeout is that it splits the ZIP files into multiple files, and if you have a lot of data to download, you'd need to download a lot of ZIP files manually.

In my current case, I have 2TB of data I want to download from Google Photos and import into Immich, but that 2TB has been split into 57 different ZIP files I need to download, MANUALLY.

The second issue is that after a while of downloading, Google cuts my session, which causes Chrome to block and cancel my download stream. So unless I split the download into much smaller files (2GB instead of 50GB per file), I can't even download them.

It would be beneficial if immich-go could:

  1. Provide a way to authenticate Google.
  2. Provide a way to select which takeout to download.
  3. Methodologically download all the takeout files (I.e., prevent human errors like accidentally downloading files that were already downloaded / accidentally missing a file to download).
  4. Automatically refresh the access token once it's expired, to prevent a download from being blocked/cancelled.
  5. Support multi-threaded download streams.

And another potential feature:

  • Download, Import to Immich, and delete from local storage. Because currently, you need ~x2 (assuming the worst for the ZIP compression) the size of your takeout to import (1x for Immich and 1x for the ZIP files).

So instead of storing all of the ZIP files and only then importing them, you could download a single ZIP file, import it into Immich, delete it, and continue to the next one - without needing to store all of the ZIP files together.

tomerh2001 avatar Apr 06 '25 10:04 tomerh2001

It would be nice if it were that simple...

Google doesn't offer any API for downloading files. Even with proper authentication to Google services, there is no way to get the list of files using an API. So, it's not currently possible.

The Takeout result is a collection of ZIP files that need to be processed together if you want to import them into Immich with their real names, persons, descriptions, dates of capture, and GPS data. This information is stored in JSON files that may be located in a different part of the Takeout.

However, you can improve your workflow:

  • Request a larger size for the parts. The maximum is 50GB.
  • Process by year or by album. This reduces the size of the batch and the number of duplicated files in the batch.

In short, it won't come soon.

simulot avatar Apr 06 '25 12:04 simulot

It would be nice if it were that simple...

Google doesn't offer any API for downloading files. Even with proper authentication to Google services, there is no way to get the list of files using an API. So, it's not currently possible.

The Takeout result is a collection of ZIP files that need to be processed together if you want to import them into Immich with their real names, persons, descriptions, dates of capture, and GPS data. This information is stored in JSON files that may be located in a different part of the Takeout.

However, you can improve your workflow:

  • Request a larger size for the parts. The maximum is 50GB.
  • Process by year or by album. This reduces the size of the batch and the number of duplicated files in the batch.

In short, it won't come soon.

For the parts where there isn't an API, what about https://pptr.dev/ ?

tomerh2001 avatar Apr 06 '25 12:04 tomerh2001

An equivalent exists for the language GO, so maybe one day. I have no idea how this could work, if the 2FA pass, or what to do if google changes its interface.

However I'm open to Pull Request.

simulot avatar Apr 06 '25 13:04 simulot

An equivalent exists for the language GO, so maybe one day. I have no idea how this could work, if the 2FA pass, or what to do if google changes its interface.

However I'm open to Pull Request.

I appreciate the openness to consider it 🙏

Since it's a one off operation and not an automation, you could just launch the browser, wait for the user to manually login, and then just scrape the download links - i.e. no fancy stuff like automating it in a headless browser

My issue is with the download itself, since it would be in Go's end and not the browser, how would you refresh the token once the session has been closed on Google's end?

If there's a solution for that, I'll try to throw a POC once I have some free time from work

tomerh2001 avatar Apr 06 '25 13:04 tomerh2001

There is a workaround that really has nothing to do with this library but something I just used in case you want to use it in the future.

  • In your browser, while on the Google Takeout download page, open Developer Tools
  • Go to the Network tab
  • Refresh the page or click on the download link
  • Find the successful download request in the network log (named something like takeout202501019782573295-001.zip)
  • Right-click on it and select "Copy as cURL"
  • Paste the copied cURL command into your terminal

This keeps the authentication that Google uses (which expires in like 10 minutes, its very annoying but at least you can use their oauth flow in the browser instead of trying to figure that part out in this library which would be a huge burden to maintain as they change it over time). Also makes downloads a lot faster if you have a good connection speed - I opened 10 tabs all downloading different parts at once.

Another suggestion - you chose 2TB file size - I would increase that file size to at least 10GB. This is the sweet spot I think between the biggest chunks but also not the worst thing in the world if the download fails mid-download.

mycarrysun avatar May 10 '25 22:05 mycarrysun