Peng-Yu Chen
Peng-Yu Chen
It would be helpful if there may be a command-line interface added for `parsel`, as existing tools (e.g. [W3's `html-xml-tools`](https://www.w3.org/Tools/HTML-XML-utils/README) and [`pup` in Golang](https://github.com/ericchiang/pup)) are not handy enough. Expected usage...
Collects not only spider exceptions, but also other kinds of errors that appear in logs. E.g.: ``` { "status": "ok", "errors": [ "Ignoring response : HTTP status code is not...
The document (version 0.9.1) says: > `:not()` accepts a sequence of _simple selectors_, not just single _simple selector_. For example, `:not(a.important[rel])` is allowed, even though the negation contains 3 _simple...
Helps resolving the issue in such cases, which does happen in several websites: ``` Python >>> from w3lib import html >>> html.get_base_url("""""") 'http://example.com/' ``` Fixes #70 (since the original #70...
In [`w3lib.html`](https://github.com/scrapy/w3lib/blob/master/w3lib/html.py) regular expressions are used to parse HTML texts: ``` python _ent_re = re.compile(r'&((?P[a-z\d]+)|#(?P\d+)|#x(?P[a-f\d]+))(?P;?)', re.IGNORECASE) _tag_re = re.compile(r'', re.DOTALL) _baseurl_re = re.compile(six.u(r']*href\s*=\s*[\"\']\s*([^\"\'\s]+)\s*[\"\']'), re.I) _meta_refresh_re = re.compile(six.u(r']*http-equiv[^>]*refresh[^>]*content\s*=\s*(?P["\'])(?P(\d*\.)?\d+)\s*;\s*url=\s*(?P.*?)(?P=quote)'), re.DOTALL | re.IGNORECASE)...
Sometimes one may want to store a request key for future skipping even when there's no item generated. Currently I handle such cases by yielding a pseudo item from such...
The repo which looks derived from this `expiringdict` project: https://github.com/rfyiamcool/expiredict/blob/05900e6411015c177a82e1323a8e0aff7da717a8/expiredict/__init__.py And the same project submitted to PYPI: https://pypi.python.org/pypi/expiredict Potentially violated clauses include (but are not limited to): > 4 ....
So that non-ASCII characters may be entered via IMEs. Resolves #37. Submitting the changes here for now for review. Further documentation work would be needed on [`README.md`](https://github.com/Genymobile/scrcpy/blob/56d237f152fc4d454a65fe594412b4bd9409fd64/README.md#input-control) and [`FAQ.md`](https://github.com/Genymobile/scrcpy/blob/56d237f152fc4d454a65fe594412b4bd9409fd64/FAQ.md#special-characters-do-not-work) either...
### Bug expectation # Overview Upon calling the History API (e.g. [the `pushStage` method](https://developer.mozilla.org/en-US/docs/Web/API/History/pushState)), a page's URL may be updated. Puppeteer is expected to catch such changes since it listens...
The current design of `HttpCacheMiddleware`: - Checks whether a request hits the cache in `process_request` - Stores a response to the cache storage in `process_response` That makes it possible to...