python-craigslist
python-craigslist copied to clipboard
Error 403 - Forbidden for url: https://www.craigslist.org/about/sites
Hi Julio,
I have used your code before (early 2020), but now I'm getting the error below when trying to import CraigslistHousing, using "from craigslist import CraigslistHousing":
HTTPError: 403 Client Error: Forbidden for url: https://www.craigslist.org/about/sites
Not sure why, it seems that could be related with this issue: https://stackoverflow.com/questions/16627227/http-error-403-in-python-3-web-scraping.
Do you happen to know why this is happening?
Thanks,
Seems like this works on my end. Did you upgrade python-craigslist
to the latest version? I have a feeling this issue might be agnostic of package upgrade, but it doesn't hurt..
Yep, I did the upgrade and continue to have the same issue. Using v1.1.0 and python 3.6, I'm using Google's Colab notebooks.
Ah, this looks to be a problem with the requests
library in your environment, not python-craigslist
, per se.
I'm guessing the same exception would be thrown if you executed this:
import requests
requests.get("https://www.craigslist.org/about/sites")
You are completely right, I also tried in a new colab and got "<Response [403]>"
If I run the code below I get a successful response and the page code. I believe it's related with the web scraping issue in this page.
from urllib.request import Request, urlopen
req = Request('https://www.craigslist.org/about/sites', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()
print(webpage)
Thanks for reporting @luisandrecunha.
Interesting. Seems like Craigslist is blocking requests coming from your IP (or Google's Colab IPs). I'm guessing the IP hit a max number of requests per day/hour/minute.
Do you mind running the code suggested by @irahorecka but setting a User-Agent like you did with urllib
:
import requests
requests.get("https://www.craigslist.org/about/sites", headers={'User-Agent': 'python-craigslist/1.1.0'})
If this works fine, I'll add a default User-Agent
to all requests to prevent this from happening in the future.
Thanks!
Hi @juliomalegria ,
It seems that Google's Colab IPs is blocked by Craigslist... I successfully ran the code in a local jupyter notebook and it worked like a charm.
I tried the code you suggested in Colab and continued to get the 403 response... However I receive the right page if I use the code below, not sure if somehow the code could be adapted.
from urllib.request import Request, urlopen
req = Request('https://www.craigslist.org/about/sites', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()
print(webpage)
Thank you again,
Just a heads up, I've got the exact same issue. I've been running my code for more than a year and this just happened this week. So, something must have changed on the craigslist side? I'll have to dig into the code. I can cut and paste the url into a browser and it works fine. Just wanted to let you know of another user with the same issues.
>>> import requests
>>> requests.get('https://boston.craigslist.org')
<Response [200]>
>>> requests.get('https://boston.craigslist.org/search')
<Response [403]>
>>> requests.get('https://boston.craigslist.org/search',headers={'User-Agent': 'XYZ/3.0'})
<Response [403]>
I tried it on a couple of computers, so I don't think it's IP related. Guess how the servers are seeing the 'requests' library versus a regular library.
Thanks!
Hey everyone! Sorry for the inactivity. I've released a new version (1.1.1
) adding a User-Agent
to requests.get
. Hopefully that will solve the issue, please report back if it does or doesn't. If it doesn't I'll have to change libraries to urllib
.
Thanks!
I am still getting the 403 error with the updated utils.py.
+1 Having the same behavior - 403s on /search
paths through just a general requests.get()
call so the library/class is also not functioning.
Also note I tried taking the headers object from the cURL to /search
which loads in a regular browser and used that for the requests call which they also blocked.
I used a selenium driver I had with some mods I've used in the past and I was able to load /search
just fine so I don't suspect they are doing something super sophisticated to block the request.
Okay I've dug into it a bit more - I don't think this has anything do to with user agents or anything they are blocking like that. I recommend upgrading both the requests
and urlib3
library pip install urllib3 --upgrade
pip install requests --upgrade
. Once I did that things started working again. So not sure the actual issue - as older versions of those libraries were working - but with the updates it looks fine to me.
After I did that I tested the request function (which is effectively requests.get()
) works:
import requests
import urllib3
from craigslist import utils
>> requests.__version__
Out[5]: '2.25.1'
>>urllib3.__version__
Out[6]: '1.26.3'
>> utils.requests_get('https://boston.craigslist.org/search')
Out[8]: <Response [200]>
Thanks @KeeonTabrizi! That's a very good point.
I've updated the requirements to include some minimum version for requirements (requests and beautifulsoup4).
Can anyone having issues try updating their library (pip install python-craigslist --upgrade
) and let me know if this fixed the issue.
Thanks again!
Hey guys.
I am not a power user, but I have found that the latest idna version is incompatible with requests. If you installed the latest idna then just run requests upgrade and it will revert the idna version. I have no clue that it could be your troubles, but it could be a factor.
Hope this helps.
Le mar. 23 févr. 2021 à 13:15, Julio M. Alegria [email protected] a écrit :
Thanks @KeeonTabrizi https://github.com/KeeonTabrizi! That's a very good point. I've updated the requirements to include some minimum version for requirements (requests and beautifulsoup4). Can anyone having issues try updating their library (pip install python-craigslist --upgrade) and let me know if this fixed the issue. Thanks again!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/juliomalegria/python-craigslist/issues/105#issuecomment-784158842, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCUNQTBYAWRMKIOJJWQDTAOL47ANCNFSM4VVVT3VQ .
Hey y'all, thanks so much for taking the time to fix this! So, it could just be how my packages were managed, but, when I performed (pip install python-craigslist --upgrade
) it updated requests
but not urllib3
. I guess urllib3
is used by requests
. So, it did not work with just upgrading python-craigslist
. But, after updating both request
and urllib3
to the latest, back up and running! Maybe consider adding urllib
to the requirements? Thanks again!!
These versions are what got my code working:
>>> requests.__version__
'2.25.1'
>>> urllib3.__version__
'1.26.3'
PS. great module, it's helped me get some great deals on Craiglist!
Hey y'all, thanks so much for taking the time to fix this! So, it could just be how my packages were managed, but, when I performed (
pip install python-craigslist --upgrade
) it updatedrequests
but noturllib3
. I guessurllib3
is used byrequests
. So, it did not work with just upgradingpython-craigslist
. But, after updating bothrequest
andurllib3
to the latest, back up and running! Maybe consider addingurllib
to the requirements? Thanks again!!These versions are what got my code working:
>>> requests.__version__ '2.25.1' >>> urllib3.__version__ '1.26.3'
PS. great module, it's helped me get some great deals on Craiglist!
+1 this fixed everything. Good catch!
@cwittwer, @jraVette, @usctzen, @KeeonTabrizi, @luisandrecunha If you guys are interested in a new Craigslist API format, check out pycraigslist. I enjoy python-craigslist, but there were some features I wanted to implement immediately. Some additional features are in the works.
Thanks, I'll check it out.
Le mar. 30 mars 2021 à 18:42, Ira Horecka @.***> a écrit :
@cwittwer https://github.com/cwittwer, @jraVette https://github.com/jraVette, @usctzen https://github.com/usctzen, @KeeonTabrizi https://github.com/KeeonTabrizi, @luisandrecunha https://github.com/luisandrecunha If you guys are interested in a new Craigslist API format, check out pycraigslist https://github.com/irahorecka/pycraigslist. I enjoy python-craigslist, but there were some features I wanted to implement immediately.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/juliomalegria/python-craigslist/issues/105#issuecomment-810412335, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCULAQFAEV7YGYKK2MNDTGH5OTANCNFSM4VVVT3VQ .
Ira,
Just gave it a quick try and I am getting an error. The script finds the forsale.mca but does not recognize the forsale.mcy mca is motorcycle all and mcy is motorcycles by owner.
Traceback (most recent call last): File
"C:/Users/mgpd/PycharmProjects/molivo/py_clist.py", line 3, in
Marc @usctzen
Le mar. 30 mars 2021 à 18:42, Ira Horecka @.***> a écrit :
@cwittwer https://github.com/cwittwer, @jraVette https://github.com/jraVette, @usctzen https://github.com/usctzen, @KeeonTabrizi https://github.com/KeeonTabrizi, @luisandrecunha https://github.com/luisandrecunha If you guys are interested in a new Craigslist API format, check out pycraigslist https://github.com/irahorecka/pycraigslist. I enjoy python-craigslist, but there were some features I wanted to implement immediately.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/juliomalegria/python-craigslist/issues/105#issuecomment-810412335, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCULAQFAEV7YGYKK2MNDTGH5OTANCNFSM4VVVT3VQ .
Hey @usctzen, I always appreciate your feedback. Could you post the same issue in pycraigslist issues? I’ll address it there :)
Sure thing!
Le mar. 30 mars 2021 à 21:36, Ira Horecka @.***> a écrit :
Hey @usctzen https://github.com/usctzen, I always appreciate your feedback. Could you post the same issue in pycraigslist issues? I’ll address it there :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/juliomalegria/python-craigslist/issues/105#issuecomment-810525225, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCUOF36FRDJWJCHYHW23TGIR3FANCNFSM4VVVT3VQ .
Hey everyone! Sorry for the delay, I've updated the requirements in 88a6b73 and pushed a new version in PyPI. Could anyone confirm if the issue is fixed with this? Thanks for all the patience!
I am still having this issue