--skip-same and -i options take too long, can we have skip-same-name?
Hello, there! And thanks for this awesome tool!
I've found --skip-same to be, IMO, time consuming.
Let's say there is a CHAT with a, growing, large file list.
export media list:
tdl chat export -c CHAT
and download everything:
time tdl dl --continue --desc --skip-same -f tdl-export.json -i mp3
real 4m2.342s
user 0m37.032s
sys 0m32.502s
Then, without changes made to tdl-export.json, re-run last command:
time tdl dl --continue --desc --skip-same -f tdl-export.json -i mp3
All files will be downloaded to 'downloads' dir
real 3m43.758s
user 0m0.814s
sys 0m0.734s
I've found there's almost no network activity on the second iteration and, this confirms:
skip-sameworks before the download and not after, so it cannot be compared based on hash.
Originally posted by @iyear in https://github.com/iyear/tdl/issues/75#issuecomment-1371035655
Also, I have seen removing the -i filter does not improve things at all; In fact, I see some jpg being downloaded here and there (in occurence order, I guess).
So, I think what's happening here is:
- both -i and --skip-same act upon HTTP/MTProto HEAD responses
- either server is throtling client or client is doing some sort of back-off to avoid flooding server.
Request: Can we have filters, like: --only: behaves like -i but acts upon "file" propperty on json export --skip-same-name: behaves like --skip-same but acts upon "file" propperty on json export --skip-same-id: behaves like --skip-same-name but acts upon chat+message id
These should avoid high percentage of HEAD requests, thus speeding things up a lot, for some use cases.
Again, thank you for this tool!
The determination logic of --skip-same is based on whether the rendered file name after template rendering exists in the target directory, and template rendering is based on obtaining the message existing on the Telegram server.
So, network requests are essential; having only the 'file' field is not sufficient. Template rendering relies on many fields.
--skip-same was born before resumable download, so theoretically, now you only need to use resumable download to quickly resume to the previous state. In other words, --skip-same helps you avoid downloading files with the same "filename", while resumable download enables you to resume downloading from the same message source.