urlwatch icon indicating copy to clipboard operation
urlwatch copied to clipboard

Costco URL giving ConnectionResetError, URL works fine in browser

Open glenviewjeff opened this issue 4 years ago • 2 comments

Using these parameters:

filter: element-by-id:tmpl_oos_overlay_img
kind: url
url: https://www.costco.com/kirkland-signature-organic-roasted-seaweed-snack%2c-
0.6-oz%2c-10-count.product.100435873.html

I get the following error:

===========================================================================
01. ERROR: https://www.costco.com/kirkland-signature-organic-roasted-seaweed-sna
ck%2c-0.6-oz%2c-10-count.product.100435873.html
===========================================================================

---------------------------------------------------------------------------
ERROR: https://www.costco.com/kirkland-signature-organic-roasted-seaweed-snack%2
c-0.6-oz%2c-10-count.product.100435873.html
---------------------------------------------------------------------------
('Connection aborted.', ConnectionResetError(10054, 'An existing connection was
forcibly closed by the remote host', None, 10054, None))
---------------------------------------------------------------------------

glenviewjeff avatar Mar 17 '20 14:03 glenviewjeff

Try turning off javascript and see what html loads for that page. urlwatch won't work on dynamically loaded content. I believe this is the issue.

Or you can curl https://www.costco.com/kirkland-signature-organic-roasted-seaweed-snack%2c-0.6-oz%2c-10-count.product.100435873.html and see what loads.

ghost avatar Mar 18 '20 02:03 ghost

(You can also check if it doesn't require javascript but does require some specific HTTP headers/etc.

As two checks, you can look at the page source (cntrl-u on chrome) and search for a keyword you're looking for and see if it's hidden in the plain HTML page source (I've had good luck in the past with a hard-coded inlined JSON object within the plain HTML or a secret internal (but open) API[1]).

Also, if you find it does require HTML headers or cookies (I've even had user-agent checks which held back CURL), in the chrome developer panel, networks pane, you can right click any network request and go "Copy -> Copy as CURL" and then make an identical network request from the terminal. If this works, you're golden. You can either modify the urlwatch source code to include those headers (custom request function) or, even easier, use the shell command 'url' to watch that shell script. (Note that I usually delete parts of it to find the critical aspect. If it requires a cookie, it might expire and stop working. At that point, look at chrome pupetter, it's surprisingly ergonomic and then there would be no difference between your request and a 'real' browser (you can even write the pupetter script so that it prints the output you want and then use the same shell script 'url' to monitor it)))

[1] The easiest way I found to find if this hidden API exists is to find the keyword of the data you want to find and then, using the chrome dev console networks panel, "export all as HAR", then search that file for your string, scrolling up to see the ultimate network request. I've tried to scrape costco before, with some success, but it's one of the more annoying ones.

JZL avatar May 25 '20 20:05 JZL