Skyperious icon indicating copy to clipboard operation
Skyperious copied to clipboard

Caching media files.

Open lockywolf opened this issue 3 years ago • 6 comments

I am exporting my database to html fairly often (in fact, every night, by cron), and exporting seems to take a lot of time, most of which is, seemingly, spent on downloading image data.

Would it be possible to cache that data somewhere? Say, in ~/.cache/skyperious/ ?

lockywolf avatar Sep 15 '22 02:09 lockywolf

I suppose this can be added. For example, disabled by default, with a configuration flag to enable it.

However, can you hazard a guess on how much of that time goes on trying to download content that is no longer available? Because that will continue to take time, as failures cannot be cached - might have been a temporary network problem during the download.

suurjaak avatar Sep 15 '22 17:09 suurjaak

Well, failures cannot be cached, I guess, but are there many "expected failures" when exporting? If thought that "in general", all links in the Skype database should link to valid files. So if I am running an export today, some files may fail due to network problems, but they are likely to be available tomorrow, so eventually everything will be cached, and only files that appeared between the two consequetive runs will have to be downloaded anew.

Is there some hidden option to "profile" downloading? See the list of files, download speed, and error status.

ghost avatar Sep 16 '22 06:09 ghost

Regarding expected failures - it depends. Files and audio/video messages are kept in Skype servers up to 30 days. And everything shared before 2017 April is no longer available anyway.

But you are right that as long as the cache is populated periodically, failures should not play much of a role.

There is no hidden option to profile downloading. But if you are using the source code distribution, and are up to a bit of Python hacking, you can add logging calls to SkypeLogin.get_api_content() in live.py on your computer (https://github.com/suurjaak/Skyperious/blob/master/src/skyperious/live.py#L889).

suurjaak avatar Sep 16 '22 16:09 suurjaak

Okay, thank you for the pointers, I will add some profiling wrappers.

ghost avatar Sep 17 '22 03:09 ghost

I'd like your input on whether this caching should be enabled by default or not.

Reasons why enabled:

  • so that things work conveniently by default
  • so that caching would get used at all; people typically stick to default program settings

Reasons why not enabled:

  • privacy. For example somebody using Skyperious on a shared computer, to download and export their Skype history. Deleting the database itself from the computer would be a rather obvious step for them to take afterwards, but clearing some semi-hidden cache folder is not. It would be an unpleasant surprise to discover that their private photos were actually left onto the computer.

suurjaak avatar Sep 17 '22 17:09 suurjaak

I think that if cache is kept together with the database, say, in ~/.config/skyperious, privacy considerations would be same for both. The cache directory can be marked with the cachedir.tag : https://bford.info/cachedir/, for cleanup programs.

So I would suggest having it on by default.

ghost avatar Sep 18 '22 15:09 ghost

Sorry for the delay, finally got around to releasing v5.4.

Added configuration flag SharedContentUseCache for this, by default false.

suurjaak avatar Dec 11 '22 16:12 suurjaak

Let me test it for a few days and get back with the feedback. Thank you for implementing this!

ghost avatar Dec 11 '22 17:12 ghost

In version 5.4 the -v and the --version options do not work.

ghost avatar Dec 13 '22 04:12 ghost

Confirmed. In fact, they haven't worked since 5.3, I now discover.

suurjaak avatar Dec 13 '22 16:12 suurjaak

I haven't found problems in a week's time, so I guess this can be closed.

Thank you!

ghost avatar Dec 21 '22 05:12 ghost