FanFicFare icon indicating copy to clipboard operation
FanFicFare copied to clipboard

crash in proxy_request (using nsapa proxy)

Open mcepl opened this issue 1 year ago • 5 comments

stitny~/K/f/tmp$ fanficfare -u The\ Moment\ It\ Began-ffnet_3735743.epub 
Updating The Moment It Began-ffnet_3735743.epub, URL: https://www.fanfiction.net/s/3735743/1/
Traceback (most recent call last):
  File "/usr/bin/fanficfare", line 33, in <module>
    sys.exit(load_entry_point('FanFicFare==4.14.0', 'console_scripts', 'fanficfare')())
  File "/usr/lib/python3.10/site-packages/fanficfare/cli.py", line 344, in main
    dispatch(options, urls, passed_defaultsini, passed_personalini, warn, fail)
  File "/usr/lib/python3.10/site-packages/fanficfare/cli.py", line 320, in dispatch
    do_download(url,
  File "/usr/lib/python3.10/site-packages/fanficfare/cli.py", line 435, in do_download
    adapter.getStoryMetadataOnly()
  File "/usr/lib/python3.10/site-packages/fanficfare/adapters/base_adapter.py", line 313, in getStoryMetadataOnly
    self.doExtractChapterUrlsAndMetadata(get_cover=get_cover)
  File "/usr/lib/python3.10/site-packages/fanficfare/adapters/adapter_fanfictionnet.py", line 114, in doExtractChapterUrlsAndMetadata
    data = self.get_request(url)
  File "/usr/lib/python3.10/site-packages/fanficfare/requestable.py", line 114, in get_request
    return self.get_request_redirected(url,
  File "/usr/lib/python3.10/site-packages/fanficfare/requestable.py", line 106, in get_request_redirected
    (data,rurl) = self.configuration.get_fetcher().get_request_redirected(
  File "/usr/lib/python3.10/site-packages/fanficfare/fetcher.py", line 398, in get_request_redirected
    fetchresp = self.do_request('GET',
  File "/usr/lib/python3.10/site-packages/fanficfare/fetcher.py", line 102, in fetcher_do_request
    fetchresp = chainfn(
  File "/usr/lib/python3.10/site-packages/fanficfare/fetcher.py", line 240, in fetcher_do_request
    fetchresp = chainfn(
  File "/usr/lib/python3.10/site-packages/fanficfare/fetcher.py", line 136, in fetcher_do_request
    fetchresp = chainfn(
  File "/usr/lib/python3.10/site-packages/fanficfare/fetcher.py", line 371, in do_request
    fetchresp = self.request(method,url,
  File "/usr/lib/python3.10/site-packages/fanficfare/nsapa_proxy.py", line 154, in request
    (type_expected, content) = self.proxy_request(url, timeout)
  File "/usr/lib/python3.10/site-packages/fanficfare/nsapa_proxy.py", line 63, in proxy_request
    size_expected = int(header.split('||')[0])
ValueError: invalid literal for int() with base 10: ''

mcepl avatar Jul 15 '22 11:07 mcepl

@nsapa - Do you have time (and interest) in looking into this?

JimmXinu avatar Jul 15 '22 13:07 JimmXinu

@JimmXinu I am taking this.

@mcepl Did chrome_content.py log any exception? The requested fanfic have 122 chapters so testing is going to be slow.

A ValueError here mean that the header is missing the content size. The proxy have is a simple header : type_of_content||size_in_bytes$END_OF_HEADER$.

I currently do not see where the proxy could fail sending a size...

nsapa avatar Jul 15 '22 17:07 nsapa

fanfiction_proxy_log.txt

Well, it is true, that the proxy run was rather wild as well. Multiple crashes, sometimes even multiple ones on one page.

mcepl avatar Jul 15 '22 20:07 mcepl

2022-07-15 22:54:17.837 CEST INFO selenium_recovery Look like we are operational again. Retrying https://www.fanfiction.net/s/10339852/5/The-Many-Harry-Potters-of-Little-Hangleton
2022-07-15 22:54:19.607 CEST CRITICAL mainloop Unrecoverable error from Selenium: unknown error: unexpected command response
  (Session info: chrome=103.0.5060.114). Killing this instance...
2022-07-15 22:54:19.607 CEST ERROR selenium_recovery Selenium error while getting https://www.fanfiction.net/s/10339852/5/The-Many-Harry-Potters-of-Little-Hangleton, recovery started.
2022-07-15 22:54:19.608 CEST INFO ProxiedBrowser(suicide) Killing Chrome with pid 24994 as last ressort cleanup.
2022-07-15 22:54:19.608 CEST INFO ProxiedBrowser(suicide) Killing chrome driver with pid 24995 as last ressort cleanup.
...
2022-07-15 22:54:22.020 CEST INFO ProxiedBrowser(init) chromedriver version 103.0.5060.53 (a1711811edd74ff1cf2150f36ffa3b0dae40b17f-refs/branch-heads/5060@{#853}) running as pid 25503 driving Chrome version 103.0.5060.114 running as pid 25502
2022-07-15 22:54:22.199 CEST INFO ProxiedBrowser(init) Trying to load existing cookie...
2022-07-15 22:54:26.827 CEST WARNING main Exception WebDriverException in the main loop (Message: unknown error: unexpected command response
  (Session info: chrome=103.0.5060.114)

At this point, Selenium is totally broken and I don't have a recovery path in the main Exception handler (because I didn't think this could happen). Look like it is a known issue in Chromium 103 and the fix is merged for 104.

Anyway, I will send a PR to make nsapa_proxy.py handle a broken proxy response (some sanity check that raise exceptions.FailedToDownload("nsapa_proxy: invalid header")).

nsapa avatar Jul 15 '22 22:07 nsapa

@nsapa Just that I have collected a lovely log when trying to update https://www.fanfiction.net/s/14001550/1/Harry-Potter-and-the-Expectations-of-the-Future

fanfiction_proxy_log.txt

mcepl avatar Jul 17 '22 05:07 mcepl

@nsapa - I don't think I got a PR for this?

JimmXinu avatar Aug 21 '22 15:08 JimmXinu

This should make fanficfare detect proxy issue. I am also updating the proxy to alert the user if the Chrome version is incompatible.

nsapa avatar Aug 21 '22 22:08 nsapa

Great, thanks!

JimmXinu avatar Aug 21 '22 23:08 JimmXinu