pywebcopy icon indicating copy to clipboard operation
pywebcopy copied to clipboard

Overwrite only if file changed mode

Open afonari opened this issue 5 years ago • 2 comments

Is it possible to only overwrite the file if the file changed since the last crawl?

afonari avatar Mar 17 '20 19:03 afonari

I don't seriously think it is possible in my capacity. If anyone has suggestions then I can sure implement it.

rajatomar788 avatar Mar 24 '20 09:03 rajatomar788

Answer: this is not possible with merely checking URLs, but it is likely that the multimedia files do not change often, so it is likely that having a "do not update" list for multimedia would be more useful.

Instead for text pages, it would be more useful to first get the page creation date being touched. See here and here for reference. (It could be inaccurate however) In Python there is a solution with urllib

from urllib.request import urlopen
urlopen("http://example.com").headers['last-modified']

Some other people have recommended the use of checksum instead, but that poses a risk on dynamically generated websites (especially with ads) that have content that constantly mutates (e.g. recommended reading lists).

There is no perfect solution, a person would have to make a sound judgement as to see which one is better.

BradKML avatar Apr 02 '23 16:04 BradKML