tdl icon indicating copy to clipboard operation
tdl copied to clipboard

--skip-same and -i options take too long, can we have skip-same-name?

Open mautematico opened this issue 2 years ago • 1 comments

Hello, there! And thanks for this awesome tool!

I've found --skip-same to be, IMO, time consuming.

Let's say there is a CHAT with a, growing, large file list.

export media list:

tdl chat export -c CHAT

and download everything:

time tdl dl --continue --desc --skip-same -f tdl-export.json -i mp3
real	4m2.342s
user	0m37.032s
sys	0m32.502s

Then, without changes made to tdl-export.json, re-run last command:

time tdl dl --continue --desc --skip-same -f tdl-export.json -i mp3
All files will be downloaded to 'downloads' dir

real	3m43.758s
user	0m0.814s
sys	0m0.734s

I've found there's almost no network activity on the second iteration and, this confirms:

skip-same works before the download and not after, so it cannot be compared based on hash.

Originally posted by @iyear in https://github.com/iyear/tdl/issues/75#issuecomment-1371035655

Also, I have seen removing the -i filter does not improve things at all; In fact, I see some jpg being downloaded here and there (in occurence order, I guess).

So, I think what's happening here is:

  • both -i and --skip-same act upon HTTP/MTProto HEAD responses
  • either server is throtling client or client is doing some sort of back-off to avoid flooding server.

Request: Can we have filters, like: --only: behaves like -i but acts upon "file" propperty on json export --skip-same-name: behaves like --skip-same but acts upon "file" propperty on json export --skip-same-id: behaves like --skip-same-name but acts upon chat+message id

These should avoid high percentage of HEAD requests, thus speeding things up a lot, for some use cases.

Again, thank you for this tool!

mautematico avatar Jan 06 '24 20:01 mautematico

The determination logic of --skip-same is based on whether the rendered file name after template rendering exists in the target directory, and template rendering is based on obtaining the message existing on the Telegram server. So, network requests are essential; having only the 'file' field is not sufficient. Template rendering relies on many fields.

--skip-same was born before resumable download, so theoretically, now you only need to use resumable download to quickly resume to the previous state. In other words, --skip-same helps you avoid downloading files with the same "filename", while resumable download enables you to resume downloading from the same message source.

iyear avatar Jan 29 '24 13:01 iyear