python-craigslist Error 403 - Forbidden for url: https://www.craigslist.org/about/sites

Hi Julio,

I have used your code before (early 2020), but now I'm getting the error below when trying to import CraigslistHousing, using "from craigslist import CraigslistHousing":

HTTPError: 403 Client Error: Forbidden for url: https://www.craigslist.org/about/sites Screen Shot 2021-01-05 at 5 25 31 PM

Not sure why, it seems that could be related with this issue: https://stackoverflow.com/questions/16627227/http-error-403-in-python-3-web-scraping.

Do you happen to know why this is happening?

Thanks,

Jan 05 '21 17:01 luisandrecunha

Seems like this works on my end. Did you upgrade python-craigslist to the latest version? I have a feeling this issue might be agnostic of package upgrade, but it doesn't hurt..

Jan 05 '21 17:01 irahorecka

Yep, I did the upgrade and continue to have the same issue. Using v1.1.0 and python 3.6, I'm using Google's Colab notebooks.

Jan 05 '21 21:01 luisandrecunha

Ah, this looks to be a problem with the requests library in your environment, not python-craigslist, per se. I'm guessing the same exception would be thrown if you executed this:

import requests
requests.get("https://www.craigslist.org/about/sites")

Jan 05 '21 21:01 irahorecka

You are completely right, I also tried in a new colab and got "<Response [403]>"

If I run the code below I get a successful response and the page code. I believe it's related with the web scraping issue in this page.

from urllib.request import Request, urlopen
req = Request('https://www.craigslist.org/about/sites', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()

print(webpage)

Jan 05 '21 22:01 luisandrecunha

Thanks for reporting @luisandrecunha.

Interesting. Seems like Craigslist is blocking requests coming from your IP (or Google's Colab IPs). I'm guessing the IP hit a max number of requests per day/hour/minute.

Do you mind running the code suggested by @irahorecka but setting a User-Agent like you did with urllib:

import requests
requests.get("https://www.craigslist.org/about/sites", headers={'User-Agent': 'python-craigslist/1.1.0'})

If this works fine, I'll add a default User-Agent to all requests to prevent this from happening in the future.

Thanks!

Jan 07 '21 16:01 juliomalegria

Hi @juliomalegria ,

It seems that Google's Colab IPs is blocked by Craigslist... I successfully ran the code in a local jupyter notebook and it worked like a charm.

I tried the code you suggested in Colab and continued to get the 403 response... However I receive the right page if I use the code below, not sure if somehow the code could be adapted.

from urllib.request import Request, urlopen
req = Request('https://www.craigslist.org/about/sites', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()

print(webpage)

Thank you again,

Jan 08 '21 23:01 luisandrecunha

Just a heads up, I've got the exact same issue. I've been running my code for more than a year and this just happened this week. So, something must have changed on the craigslist side? I'll have to dig into the code. I can cut and paste the url into a browser and it works fine. Just wanted to let you know of another user with the same issues.

>>> import requests
>>> requests.get('https://boston.craigslist.org')
<Response [200]>
>>> requests.get('https://boston.craigslist.org/search')
<Response [403]>
>>> requests.get('https://boston.craigslist.org/search',headers={'User-Agent': 'XYZ/3.0'})
<Response [403]>

I tried it on a couple of computers, so I don't think it's IP related. Guess how the servers are seeing the 'requests' library versus a regular library.

Thanks!

Feb 18 '21 19:02 jraVette

Hey everyone! Sorry for the inactivity. I've released a new version (1.1.1) adding a User-Agent to requests.get. Hopefully that will solve the issue, please report back if it does or doesn't. If it doesn't I'll have to change libraries to urllib. Thanks!

Feb 19 '21 10:02 juliomalegria

I am still getting the 403 error with the updated utils.py.

Feb 20 '21 05:02 cwittwer

+1 Having the same behavior - 403s on /search paths through just a general requests.get() call so the library/class is also not functioning.

Also note I tried taking the headers object from the cURL to /search which loads in a regular browser and used that for the requests call which they also blocked.

I used a selenium driver I had with some mods I've used in the past and I was able to load /search just fine so I don't suspect they are doing something super sophisticated to block the request.

Feb 21 '21 03:02 KeeonTabrizi

Okay I've dug into it a bit more - I don't think this has anything do to with user agents or anything they are blocking like that. I recommend upgrading both the requests and urlib3 library pip install urllib3 --upgrade pip install requests --upgrade. Once I did that things started working again. So not sure the actual issue - as older versions of those libraries were working - but with the updates it looks fine to me.

After I did that I tested the request function (which is effectively requests.get()) works:

import requests
import urllib3
from craigslist import utils

>> requests.__version__
Out[5]: '2.25.1'

>>urllib3.__version__
Out[6]: '1.26.3'

>> utils.requests_get('https://boston.craigslist.org/search')
Out[8]: <Response [200]>

Feb 22 '21 06:02 KeeonTabrizi

Thanks @KeeonTabrizi! That's a very good point. I've updated the requirements to include some minimum version for requirements (requests and beautifulsoup4). Can anyone having issues try updating their library (pip install python-craigslist --upgrade) and let me know if this fixed the issue. Thanks again!

Feb 23 '21 12:02 juliomalegria

Hey guys.

I am not a power user, but I have found that the latest idna version is incompatible with requests. If you installed the latest idna then just run requests upgrade and it will revert the idna version. I have no clue that it could be your troubles, but it could be a factor.

Hope this helps.

Le mar. 23 févr. 2021 à 13:15, Julio M. Alegria [email protected] a écrit :

Thanks @KeeonTabrizi https://github.com/KeeonTabrizi! That's a very good point. I've updated the requirements to include some minimum version for requirements (requests and beautifulsoup4). Can anyone having issues try updating their library (pip install python-craigslist --upgrade) and let me know if this fixed the issue. Thanks again!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/juliomalegria/python-craigslist/issues/105#issuecomment-784158842, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCUNQTBYAWRMKIOJJWQDTAOL47ANCNFSM4VVVT3VQ .

Feb 23 '21 12:02 usctzen

Hey y'all, thanks so much for taking the time to fix this! So, it could just be how my packages were managed, but, when I performed (pip install python-craigslist --upgrade) it updated requests but not urllib3. I guess urllib3 is used by requests. So, it did not work with just upgrading python-craigslist. But, after updating both request and urllib3 to the latest, back up and running! Maybe consider adding urllib to the requirements? Thanks again!!

These versions are what got my code working:

>>> requests.__version__
'2.25.1'
>>> urllib3.__version__
'1.26.3'

PS. great module, it's helped me get some great deals on Craiglist!

Feb 23 '21 14:02 jraVette

Hey y'all, thanks so much for taking the time to fix this! So, it could just be how my packages were managed, but, when I performed (pip install python-craigslist --upgrade) it updated requests but not urllib3. I guess urllib3 is used by requests. So, it did not work with just upgrading python-craigslist. But, after updating both request and urllib3 to the latest, back up and running! Maybe consider adding urllib to the requirements? Thanks again!!

These versions are what got my code working:
>>> requests.__version__
'2.25.1'
>>> urllib3.__version__
'1.26.3'
PS. great module, it's helped me get some great deals on Craiglist!

+1 this fixed everything. Good catch!

Mar 11 '21 04:03 cwittwer

@cwittwer, @jraVette, @usctzen, @KeeonTabrizi, @luisandrecunha If you guys are interested in a new Craigslist API format, check out pycraigslist. I enjoy python-craigslist, but there were some features I wanted to implement immediately. Some additional features are in the works.

Mar 30 '21 16:03 irahorecka

Thanks, I'll check it out.

Le mar. 30 mars 2021 à 18:42, Ira Horecka @.***> a écrit :

@cwittwer https://github.com/cwittwer, @jraVette https://github.com/jraVette, @usctzen https://github.com/usctzen, @KeeonTabrizi https://github.com/KeeonTabrizi, @luisandrecunha https://github.com/luisandrecunha If you guys are interested in a new Craigslist API format, check out pycraigslist https://github.com/irahorecka/pycraigslist. I enjoy python-craigslist, but there were some features I wanted to implement immediately.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/juliomalegria/python-craigslist/issues/105#issuecomment-810412335, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCULAQFAEV7YGYKK2MNDTGH5OTANCNFSM4VVVT3VQ .

Mar 30 '21 17:03 usctzen

Ira,

Just gave it a quick try and I am getting an error. The script finds the forsale.mca but does not recognize the forsale.mcy mca is motorcycle all and mcy is motorcycles by owner.

Traceback (most recent call last): File "C:/Users/mgpd/PycharmProjects/molivo/py_clist.py", line 3, in print(pycraigslist.forsale.mcy.get_filters())AttributeError: type object 'forsale' has no attribute 'mcy'

Marc @usctzen

Le mar. 30 mars 2021 à 18:42, Ira Horecka @.***> a écrit :

@cwittwer https://github.com/cwittwer, @jraVette https://github.com/jraVette, @usctzen https://github.com/usctzen, @KeeonTabrizi https://github.com/KeeonTabrizi, @luisandrecunha https://github.com/luisandrecunha If you guys are interested in a new Craigslist API format, check out pycraigslist https://github.com/irahorecka/pycraigslist. I enjoy python-craigslist, but there were some features I wanted to implement immediately.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/juliomalegria/python-craigslist/issues/105#issuecomment-810412335, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCULAQFAEV7YGYKK2MNDTGH5OTANCNFSM4VVVT3VQ .

Mar 30 '21 19:03 usctzen

Hey @usctzen, I always appreciate your feedback. Could you post the same issue in pycraigslist issues? I’ll address it there :)

Mar 30 '21 19:03 irahorecka

Sure thing!

Le mar. 30 mars 2021 à 21:36, Ira Horecka @.***> a écrit :

Hey @usctzen https://github.com/usctzen, I always appreciate your feedback. Could you post the same issue in pycraigslist issues? I’ll address it there :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/juliomalegria/python-craigslist/issues/105#issuecomment-810525225, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCUOF36FRDJWJCHYHW23TGIR3FANCNFSM4VVVT3VQ .

Mar 30 '21 19:03 usctzen

Hey everyone! Sorry for the delay, I've updated the requirements in 88a6b73 and pushed a new version in PyPI. Could anyone confirm if the issue is fixed with this? Thanks for all the patience!

Apr 06 '21 19:04 juliomalegria

I am still having this issue

Nov 21 '22 22:11 Agwebberley

python-craigslist python-craigslist copied to clipboard

Error 403 - Forbidden for url: https://www.craigslist.org/about/sites

python-craigslist
python-craigslist copied to clipboard