pywebcopy
pywebcopy copied to clipboard
URL changed when i set url property of the WebPage's get method.
Hi, my url contain two dots, this program ignores those dots and made a wrong asset url. For example: My url is https://example.com/rims/pro/0aff25a9b7d8705d99d558e82a19f8f8/sec/HQv2cFXZHwS6kcNVuquD6etOFRPDO7kvo_XJ6lzbxFMYxHUNy3xND7XT9Hlpqvl04hcf77j9NqhV7bF5cF129THtfGkM4rvQBOUKqT027uIuN4A7M8rvNHupBhay1QNyenlkLVk3kipkNnS1urCAHg../sed/tipps/html/0d-1568269-master.html?docuNo=7cdeaaa7e7d69c174aca6a55b1221310
You can see that before "/sed" there are two dots in it. After i crawl the website, some of the asset url changed, for example https://example.com/rims/pro/0aff25a9b7d8705d99d558e82a19f8f8/sec/HQv2cFXZHwS6kcNVuquD6etOFRPDO7kvo_XJ6lzbxFMYxHUNy3xND7XT9Hlpqvl04hcf77j9NqhV7bF5cF129THtfGkM4rvQBOUKqT027uIuN4A7M8rvNHupBhay1QNyenlkLVk3kipkNnS1urCAHgsed/tipps/assets/scss/hst2-param.css
The css link href is "../assets/scss/hst2-param.css"
It isnt ignoring the 'dots', this is a basic security measure to prevent unauthorized access to the user files by the program. If this protection is removed then the downloaded files could be saved or read or deleted from unexpected directories. @zengyinggang