grimoirelab-perceval icon indicating copy to clipboard operation
grimoirelab-perceval copied to clipboard

[pipermail] 403 error when getting certain mailing lists

Open jgbarah opened this issue 6 years ago • 6 comments

When retrieving certain mailing lists, I'm getting an error, apparently due to 403 when accessing them. But the same page which raises this 403 can be retrieved via wget. I did try using wget with some of the archive files for the same mailign list, and that worked too:

perceval pipermail --category message https://lists.linuxfoundation.org/pipermail/oss-health-metrics
[2018-05-21 18:04:03,451] - Sir Perceval is on his quest.
[2018-05-21 18:04:03,453] - Looking for messages from 'https://lists.linuxfoundation.org/pipermail/oss-health-metrics' since 1970-01-01 00:00:00+00:00
[2018-05-21 18:04:03,453] - Downloading mboxes from 'https://lists.linuxfoundation.org/pipermail/oss-health-metrics' to since 1970-01-01 00:00:00+00:00
Traceback (most recent call last):
  File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 380, in run
    for item in items:
  File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 484, in fetch
    raise e
  File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 478, in fetch
    for item in items:
  File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 127, in fetch
    for item in self.fetch_items(category, **kwargs):
  File "/tmp/perceval/lib/python3.6/site-packages/perceval/backends/core/pipermail.py", line 103, in fetch_items
    mailing_list.fetch(from_date=from_date)
  File "/tmp/perceval/lib/python3.6/site-packages/perceval/backends/core/pipermail.py", line 203, in fetch
    r.raise_for_status()
  File "/tmp/perceval/lib/python3.6/site-packages/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://lists.linuxfoundation.org/pipermail/oss-health-metrics

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/perceval/bin/perceval", line 194, in <module>
    main()
  File "/tmp/perceval/bin/perceval", line 112, in main
    cmd.run()
  File "/tmp/perceval/lib/python3.6/site-packages/perceval/backend.py", line 385, in run
    raise RuntimeError(str(e))
RuntimeError: 403 Client Error: Forbidden for url: https://lists.linuxfoundation.org/pipermail/oss-health-metrics
wget https://lists.linuxfoundation.org/pipermail/oss-health-metrics
--2018-05-21 18:07:25--  https://lists.linuxfoundation.org/pipermail/oss-health-metrics
Resolving lists.linuxfoundation.org (lists.linuxfoundation.org)... 151.101.66.49, 151.101.130.49, 151.101.194.49, ...
Connecting to lists.linuxfoundation.org (lists.linuxfoundation.org)|151.101.66.49|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://lists.linuxfoundation.org/pipermail/oss-health-metrics/ [following]
--2018-05-21 18:07:25--  https://lists.linuxfoundation.org/pipermail/oss-health-metrics/
Reusing existing connection to lists.linuxfoundation.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 7042 (6.9K) [text/html]
Saving to: ‘oss-health-metrics’

oss-health-metrics                     100%[==========================================================================>]   6.88K  --.-KB/s    in 0s      

2018-05-21 18:07:25 (82.1 MB/s) - ‘oss-health-metrics’ saved [7042/7042]

wget https://lists.linuxfoundation.org/pipermail/oss-health-metrics/2018-April.txt.gz
--2018-05-21 18:07:53--  https://lists.linuxfoundation.org/pipermail/oss-health-metrics/2018-April.txt.gz
Resolving lists.linuxfoundation.org (lists.linuxfoundation.org)... 151.101.130.49, 151.101.194.49, 151.101.2.49, ...
Connecting to lists.linuxfoundation.org (lists.linuxfoundation.org)|151.101.130.49|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19640 (19K) [application/x-gzip]
Saving to: ‘2018-April.txt.gz’

2018-April.txt.gz                      100%[==========================================================================>]  19.18K  --.-KB/s    in 0.005s  

2018-05-21 18:07:54 (3.56 MB/s) - ‘2018-April.txt.gz’ saved [19640/19640]

All of this with latest pip packages (18.05-02).

jgbarah avatar May 21 '18 16:05 jgbarah

I wonder if this could be somehow related to #358

jgbarah avatar May 21 '18 16:05 jgbarah

@jgbarah I don't think it's related to that bug.

I get the same error. With wget and curl it works. On this last case changing the URL to https://lists.linuxfoundation.org/pipermail/oss-health-metrics/. Otherwise, I get a 303 error.

Using requests library I get a 403 error.

$ ipython
In [1]: import requests

In [2]: r = requests.get("https://lists.linuxfoundation.org/pipermail/oss-health-metrics/")

In [3]: r
Out[3]: <Response [403]>

sduenas avatar May 21 '18 16:05 sduenas

It works changing the User-Agent header.

In [16]: headers = {'User-Agent': "Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/13.0 Firefox/13.0"}

In [17]: r = requests.get("https://lists.linuxfoundation.org/pipermail/oss-health-metrics/", headers=headers)

In [18]: r
Out[18]: <Response [200]>

In Perceval we use 'Perceval/' + __version__. See https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/client.py#L74

sduenas avatar May 21 '18 16:05 sduenas

Could it be a selective banning for Perceval in the server side? If you think it could be, I may contact LF to see what's going on...

jgbarah avatar May 22 '18 21:05 jgbarah

I think we should contact them. In any case, requests package has python-requests/1.2.0 by default as the value to this header. That one is failing too.

sduenas avatar May 22 '18 22:05 sduenas

Message sent to the LF helpdesk.

jgbarah avatar May 23 '18 23:05 jgbarah

The access to this mailing list works again.

sduenas avatar Oct 11 '23 15:10 sduenas