grimoirelab-perceval
grimoirelab-perceval copied to clipboard
[pipermail] 403 error when getting certain mailing lists
When retrieving certain mailing lists, I'm getting an error, apparently due to 403 when accessing them. But the same page which raises this 403 can be retrieved via wget. I did try using wget with some of the archive files for the same mailign list, and that worked too:
perceval pipermail --category message https://lists.linuxfoundation.org/pipermail/oss-health-metrics
[2018-05-21 18:04:03,451] - Sir Perceval is on his quest.
[2018-05-21 18:04:03,453] - Looking for messages from 'https://lists.linuxfoundation.org/pipermail/oss-health-metrics' since 1970-01-01 00:00:00+00:00
[2018-05-21 18:04:03,453] - Downloading mboxes from 'https://lists.linuxfoundation.org/pipermail/oss-health-metrics' to since 1970-01-01 00:00:00+00:00
Traceback (most recent call last):
File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 380, in run
for item in items:
File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 484, in fetch
raise e
File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 478, in fetch
for item in items:
File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 127, in fetch
for item in self.fetch_items(category, **kwargs):
File "/tmp/perceval/lib/python3.6/site-packages/perceval/backends/core/pipermail.py", line 103, in fetch_items
mailing_list.fetch(from_date=from_date)
File "/tmp/perceval/lib/python3.6/site-packages/perceval/backends/core/pipermail.py", line 203, in fetch
r.raise_for_status()
File "/tmp/perceval/lib/python3.6/site-packages/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://lists.linuxfoundation.org/pipermail/oss-health-metrics
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/perceval/bin/perceval", line 194, in <module>
main()
File "/tmp/perceval/bin/perceval", line 112, in main
cmd.run()
File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 385, in run
raise RuntimeError(str(e))
RuntimeError: 403 Client Error: Forbidden for url: https://lists.linuxfoundation.org/pipermail/oss-health-metrics
wget https://lists.linuxfoundation.org/pipermail/oss-health-metrics
--2018-05-21 18:07:25-- https://lists.linuxfoundation.org/pipermail/oss-health-metrics
Resolving lists.linuxfoundation.org (lists.linuxfoundation.org)... 151.101.66.49, 151.101.130.49, 151.101.194.49, ...
Connecting to lists.linuxfoundation.org (lists.linuxfoundation.org)|151.101.66.49|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://lists.linuxfoundation.org/pipermail/oss-health-metrics/ [following]
--2018-05-21 18:07:25-- https://lists.linuxfoundation.org/pipermail/oss-health-metrics/
Reusing existing connection to lists.linuxfoundation.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 7042 (6.9K) [text/html]
Saving to: ‘oss-health-metrics’
oss-health-metrics 100%[==========================================================================>] 6.88K --.-KB/s in 0s
2018-05-21 18:07:25 (82.1 MB/s) - ‘oss-health-metrics’ saved [7042/7042]
wget https://lists.linuxfoundation.org/pipermail/oss-health-metrics/2018-April.txt.gz
--2018-05-21 18:07:53-- https://lists.linuxfoundation.org/pipermail/oss-health-metrics/2018-April.txt.gz
Resolving lists.linuxfoundation.org (lists.linuxfoundation.org)... 151.101.130.49, 151.101.194.49, 151.101.2.49, ...
Connecting to lists.linuxfoundation.org (lists.linuxfoundation.org)|151.101.130.49|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19640 (19K) [application/x-gzip]
Saving to: ‘2018-April.txt.gz’
2018-April.txt.gz 100%[==========================================================================>] 19.18K --.-KB/s in 0.005s
2018-05-21 18:07:54 (3.56 MB/s) - ‘2018-April.txt.gz’ saved [19640/19640]
All of this with latest pip packages (18.05-02).
I wonder if this could be somehow related to #358
@jgbarah I don't think it's related to that bug.
I get the same error. With wget
and curl
it works. On this last case changing the URL to https://lists.linuxfoundation.org/pipermail/oss-health-metrics/. Otherwise, I get a 303 error.
Using requests
library I get a 403 error.
$ ipython
In [1]: import requests
In [2]: r = requests.get("https://lists.linuxfoundation.org/pipermail/oss-health-metrics/")
In [3]: r
Out[3]: <Response [403]>
It works changing the User-Agent
header.
In [16]: headers = {'User-Agent': "Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/13.0 Firefox/13.0"}
In [17]: r = requests.get("https://lists.linuxfoundation.org/pipermail/oss-health-metrics/", headers=headers)
In [18]: r
Out[18]: <Response [200]>
In Perceval we use 'Perceval/' + __version__
. See https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/client.py#L74
Could it be a selective banning for Perceval in the server side? If you think it could be, I may contact LF to see what's going on...
I think we should contact them. In any case, requests
package has python-requests/1.2.0
by default as the value to this header. That one is failing too.
Message sent to the LF helpdesk.
The access to this mailing list works again.