PixivUtil2 icon indicating copy to clipboard operation
PixivUtil2 copied to clipboard

HTTP Error 500: Internal Server Error for larger API calls, doesn't seem to be a server issue.

Open AgentThirteen opened this issue 3 years ago • 10 comments

Prerequisites

  • [x] Did you read FAQ section in readme.md?
  • [x] Did you test with the latest releases or commit ?
  • [x] Did you search for existing issues in Issues?

Description

Large API calls return HTTP 500 which then delays all further calls for a few minutes and will happen again after resuming a problematic query. The same errors happen on the website telling the user to try again later. Doesn't seem to be a temporary outage as the site is not impacted when trying from another connection.

Steps to Reproduce

  1. Search by tags (large enough query, 5000+ results)
  2. Wait until the process fails (may happen in the middle of a page)
  3. Abort as the process won't fetch anymore pages until the flag is gone even with a new, shorter search

Expected behavior: API successfull calls from page 1 to last page

Actual behavior: Fails and subsequently gives zero results on what seems to be too many requests (429?)

Log file:

Server did not return images, expected to have more (looping then retrying)

ERROR - Traceback (most recent call last):
  File "PixivBrowserFactory.py", line 252, in getPixivPage
    temp = self.open_with_retry(req)
  File "PixivBrowserFactory.py", line 210, in open_with_retry
    res = self.open(url, data, timeout)
  File "_mechanize.py", line 257, in open
    return self._mech_open(url_or_request, data, timeout=timeout)
  File "_mechanize.py", line 313, in _mech_open
    raise response
mechanize._response.get_seek_wrapper_class.<locals>.httperror_seek_wrapper: HTTP Error 500: Internal Server Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "PixivImageHandler.py", line 76, in process_image
    (image, parse_medium_page) = PixivBrowserFactory.getBrowser().getImagePage(image_id=image_id,
  File "PixivBrowserFactory.py", line 612, in getImagePage
    response = self.getPixivPage(url, returnParsed=False, enable_cache=False)
  File "PixivBrowserFactory.py", line 265, in getPixivPage
    raise PixivException(f"Failed to get page: {url}", errorCode=PixivException.SERVER_ERROR)

Versions

20220311-beta3 20220804

AgentThirteen avatar Aug 11 '22 14:08 AgentThirteen

looks like pixiv temp-ban your ip?

Nandaka avatar Aug 11 '22 14:08 Nandaka

Sorry for the late reply and thanks for the quick response, it looks like a throttling issue that lasted for about a week and didn't affect fanbox, only pixiv (and intermittently, just like a 429).

Everything seems to work normally for now though there is no message from the administration about saturating the servers (some users were contacted in the past for this) when this happens. Thank you again for your time!

Edit: Well that was fast. It's happening again, so I guess users should really limit their queries to no more than 5000 results to be on the safe side.

AgentThirteen avatar Aug 13 '22 11:08 AgentThirteen

Just to clarify, do you mean 5000 result in single query? because normally the script will only get 60post per query for search by tags.

Nandaka avatar Aug 13 '22 12:08 Nandaka

I'm experiencing this as soon as I start up a second instance of PixivUtil, it seems like they may be recognizing and cracking down on running multiple connections now.

Syampuuh avatar Aug 14 '22 01:08 Syampuuh

Yes Nandaka, such queries (using 60 posts at a time of course) weren't a problem even with multiple instances up until now.

However, as Syampuuh said, multiple connections now automatically get rejected as a 500 internal error after a while, sometimes instantly. Not sure there's anything that can be done if it's a server-side crackdown.

AgentThirteen avatar Aug 15 '22 00:08 AgentThirteen

Multiple instances don't seem possible anymore, unless attempting route some traffic through a vpn.

In any case, i tried even with a single instance, and it just said no midway through, and denied all further requests, i'm not able to determine the exact reason. Its quite upsetting.

pxssy avatar Aug 17 '22 14:08 pxssy

maybe try to create separate account to test?

Nandaka avatar Aug 26 '22 01:08 Nandaka

Even fresh, brand new accounts or unused old ones (on devices that didn't perform any other action and use different IPs) are affected, so it doesn't seem to be a long-term flagging issue.

The only real workaround seems to be something like delaying each 60-post API call manually for searches that yield more than 5000 results (about 85 pages with 59 posts per page). Also, if that helps, I got a failed to connect to Google recaptcha service error that prevented all queries for about 15 minutes while browsing, which may indicate the failing server is a CDN attempting to throw a captcha?

I don't think this one is Cloudflare related, at least.

AgentThirteen avatar Aug 26 '22 16:08 AgentThirteen

I'm not getting 500 errors but I am being heavily throttled by Pixiv lately. This has never happened before (though I have always assumed it could) in the decade I have been using this app, but I paid for a VPN to connect to Japan and browse Pixiv from there. It's only happening on the content hosting servers (i.pximg.net), not the web (pixiv.net).

biggestsonicfan avatar Aug 29 '22 05:08 biggestsonicfan

I remember the heavy throttling thing. It's basically a flag that lasts for about a week after your IP, account, or anything that has "seen" you through pixiv or fanbox will throttle the connection down to about 10-15KB/s after a certain amount of bandwidth usage (5-10 gigabytes?) within a week - I have yet to encounter this one again since I hardly download 50MB a week those days. The difference, though, is that it also affected the web for me (both pixiv and fanbox were so slow they'd hardly load at all). In this case, and if it happens again without warning, a VPN may be required.

As for news on this one, this now seems to be more of a 429 (too many requests) on large requests or too many tabs (when on web), though this only happened with fanbox before. I have yet to run into another 500 but 429s are daily occurrences even with small queries. I'm also using a large delay between API calls (default is 120s when there is content detected by pixivutil but the server doesn't return anything, I think?) and this doesn't really help; only completely killing the process for about 15 minutes seems to fix it temporarily.

Rather long for a 429 threshold and unless there were heavy changes, I don't think a 429 is supposed to trigger every 5 result pages or so. If someone with a premium account (be careful!) could get feedback on this, that'd be greatly appreciated, thanks.

AgentThirteen avatar Aug 31 '22 10:08 AgentThirteen