twitterscraper icon indicating copy to clipboard operation
twitterscraper copied to clipboard

JSONDecodeErrors

Open LinqLover opened this issue 4 years ago • 10 comments

Occurring sporadically. This does not break the execution of twitterscraper, but appears as an error in our log.

ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCwLSBg42rtSEWgsC8udGv56QiEjUAFQAlAFUAFQAA&q=museumbarberini%20since%3A2019-12-01%20until%3A2020-03-05&l="
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/twitterscraper/query.py", line 99, in query_single_page
    json_resp = response.json()
  File "/usr/local/lib/python3.6/dist-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
INFO: Got 358 tweets for museumbarberini%20since%3A2017-01-26%20until%3A2017-04-30.

LinqLover avatar Mar 05 '20 13:03 LinqLover

What is the original request? Have you set any limit for tweets to be retrieved?

gabri985 avatar Mar 09 '20 08:03 gabri985

ts.query_tweets("museumbarberini", begindate=dt.date(2015, 1, 1))

LinqLover avatar Mar 09 '20 09:03 LinqLover

Ok, I was experiencing a similar issue. In my case, I'm asking for all tweets in the past two weeks for a set of 10 different hashtags and after a while, the server is responding with a 429 TOO_MANY_REQUESTS. For those kinds of responses, the body sent from Twitter is a page basically telling you "oops, we are slowing down you!", which has an unknown format for the scraper and this causes the decoding error (since the scraper does not check for HTTP response statuses, I could not understand immediately the root cause of the problem).

I don't know if this can be the same issue, anyway I had to debug the code to confirm that.

gabri985 avatar Mar 09 '20 09:03 gabri985

same here, the very first line of response is ERROR: <Response [429]>

to check that i have added logger.exception("{}".format(response)) just after line logger.exception('Failed to parse JSON "{}" while requesting "{}"'.format(e, url)) in query.py

a dirty-dirty workaround was for me to add

    if retry == 45:
        logger.info("RETRY: {}".format(str(retry)))
        logger.info("SLEEPING")
        time.sleep(360)

to the beginning of the query_single_page function.

But this is a very-very ugly and dirty temporary not-a-fix

also random.shuffle(proxies) seems to help for this error (or I am being delusional here) ;)

SpaceCadetSkywalker avatar Mar 18 '20 05:03 SpaceCadetSkywalker

Wow, it seems as we are missing a request.raise_for_status() just here. If I understand the code correctly (which appears to be "slightly" abusive in terms of recursion), the right way could be:

        response = requests.get(url, headers=HEADER, proxies={"http": proxy}, timeout=timeout)
        if response.status == 429:
            return query_single_page(query, lang, pos, retry - 1, from_user)
        response.raise_for_status()

LinqLover avatar Mar 19 '20 08:03 LinqLover

Is there any plans to fix this? I'm running into it an awful lot recently.

EthanZeigler avatar May 04 '20 20:05 EthanZeigler

can someone share his/her code i am new to python i am in tool based scraping but want twitter data for research work

asif-faizan avatar May 08 '20 20:05 asif-faizan

I have a working fork. It sleeps for 5 minutes once the rate limit errors start and resumes afterwords. Worked perfectly for me.

https://github.com/EthanZeigler/twitterscraper

To pull using pip, search up something like pip install from git. Should find a solution.

EthanZeigler avatar May 08 '20 21:05 EthanZeigler

@EthanZeigler could you make a PR?

lapp0 avatar Jun 07 '20 21:06 lapp0

Yup. Just didn't want to change default behavior without a CLI option to change it. Got sidetracked and forgot about it.

EthanZeigler avatar Jun 07 '20 21:06 EthanZeigler