Mikhail Korobov
Mikhail Korobov
Yeah, I agree. There is QWebSettings::FrameFlatteningEnabled option (http://qutebrowser.org/tmp/qtdoc-linktitle/qwebsettings.html), maybe it could work for Splash. Alternatively, there is an API to go into iframes in QtWebKit (but not in upcoming QtWebEngine);...
Frame flattening option doesn't seem to work - I've tried it here https://github.com/scrapinghub/splash/commit/8eb45d89d1d798695d2f17e92f69647c15dca27a, splash:html() doesn't include html content of iframes.
> Is it possible to merge this PR and release a new version? @kmike ? Apparently it's not @costika1234, unfortunately I've lost commit access to this repo :( I still...
what happens? is it because timeout is not large enough to download a file, or is it a problem because Splash doesn't handle non-html splash:go?
Splash doesn't handle unsupported content now (http://doc.qt.io/archives/qt-5.5/qwebpage.html#forwardUnsupportedContent-prop), to fix it we need to add an API for that to Splash
It makes sense to add this feature to scrapy-splash (handle network.. error codes in addition to http.. codes when `http_status_from_error_code` is True). But I'm not sure what should we set...
network... status codes: http://doc.qt.io/qt-5/qnetworkreply.html#NetworkError-enum
It means resource_timeout was applied for the first request, and it timed out; this is a bit different from regular Splash timeouts. But yeah, it makes sense to handle network5...
It also could make sense to apply a larger timeout for the first request (see Example 6 here: http://splash.readthedocs.org/en/stable/scripting-ref.html#splash-on-request) - what do you think?
@oasis789 crfsuite implements vectorization itself, that's why dicts are currently exposed. I wonder why do you prefer DictVectorizer - sklearn-crfsuite data format is largely compatible, with a few extra features...