pywebcopy icon indicating copy to clipboard operation
pywebcopy copied to clipboard

site restrictions

Open marshonhuckleberry opened this issue 5 years ago • 5 comments

works on some websites but in others it fails, i looked in issues for any solution for "permission error" found one i ignored robots.txt but it still gets permission error, but there is just a small difference with robots txt bypass it downloads 1 more page than before, no chance with this site "http://mathworld.wolfram.com/"

marshonhuckleberry avatar Jan 23 '20 07:01 marshonhuckleberry

What code are you using? I need to see the log file if you find it properly.

On Thu, Jan 23, 2020, 1:07 PM marshonhuckleberry [email protected] wrote:

works on some websites but in others it fails, i looked in issues for any solution for "permission error" found one i ignored robots.txt but it still gets permission error, but there is just a small difference with robots txt bypass it downloads 1 more page than before, no chance with this site " http://mathworld.wolfram.com/"

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rajatomar788/pywebcopy/issues/28?email_source=notifications&email_token=AIGSNTWJATI3AAJWIBNUD73Q7FCKZA5CNFSM4KKSAFWKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IIFGHJQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGSNTUC7HVRRCKJQUT5WVDQ7FCKZANCNFSM4KKSAFWA .

rajatomar788 avatar Jan 23 '20 07:01 rajatomar788

the code: import pywebcopy import requests from pywebcopy import save_webpage

pywebcopy.SESSION.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36' kwargs = {'project_name': 'new'}

save_webpage( url='http://mathworld.wolfram.com/topics/', project_folder='path', bypass_robots=True, debug=True, **kwargs ) the log file: pywebcopy_log.log

marshonhuckleberry avatar Jan 26 '20 15:01 marshonhuckleberry

Try setting up the user-agent in the pywebcopy.config so that it changes it across the project.


import pywebcopy

pywebcopy.config['http_headers']['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'

pywebcopy.config.setup_config("http://mathworld.wolfram.com/", "path", project_name="new", bypass_robots=True)

pywebcopy.save_webpage("http://mathworld.wolfram.com/", "path")

rajatomar788 avatar Jan 29 '20 13:01 rajatomar788

error!

marshonhuckleberry avatar Jan 31 '20 07:01 marshonhuckleberry