Peng-Yu Chen issues

Results 16 issues of


                                            Peng-Yu Chen

[FeatureRequest] Adding the command line interface.

It would be helpful if there may be a command-line interface added for `parsel`, as existing tools (e.g. [W3's `html-xml-tools`](https://www.w3.org/Tools/HTML-XML-utils/README) and [`pup` in Golang](https://github.com/ericchiang/pup)) are not handy enough. Expected usage...

enhancement

discuss

Added: Enhanced spdier log/error handling

Collects not only spider exceptions, but also other kinds of errors that appear in logs. E.g.: ``` { "status": "ok", "errors": [ "Ignoring response : HTTP status code is not...

enhancement

[feature-request] `:not()` to support generic selectors (not only "simple" ones)

The document (version 0.9.1) says: > `:not()` accepts a sequence of _simple selectors_, not just single _simple selector_. For example, `:not(a.important[rel])` is allowed, even though the negation contains 3 _simple...

CSS Selectors Level 4

[MRG+1] Added: Removing comments before extracting base URLs. Not a solution to #70, but does help in some cases.

Helps resolving the issue in such cases, which does happen in several websites: ``` Python >>> from w3lib import html >>> html.get_base_url("""""") 'http://example.com/' ``` Fixes #70 (since the original #70...

It's not a good idead to parse HTML text using regular expressions

In [`w3lib.html`](https://github.com/scrapy/w3lib/blob/master/w3lib/html.py) regular expressions are used to parse HTML texts: ``` python _ent_re = re.compile(r'&((?P[a-z\d]+)|#(?P\d+)|#x(?P[a-f\d]+))(?P;?)', re.IGNORECASE) _tag_re = re.compile(r'', re.DOTALL) _baseurl_re = re.compile(six.u(r']*href\s*=\s*[\"\']\s*([^\"\'\s]+)\s*[\"\']'), re.I) _meta_refresh_re = re.compile(six.u(r']*http-equiv[^>]*refresh[^>]*content\s*=\s*(?P["\'])(?P(\d*\.)?\d+)\s*;\s*url=\s*(?P.*?)(?P=quote)'), re.DOTALL | re.IGNORECASE)...

bug

Added: A DeltaFetchPseudoItem for storing requests with no items yielded

Sometimes one may want to store a request key for future skipping even when there's no item generated. Currently I handle such cases by yielding a pseudo item from such...

Just saw another project that might have violated Apache License v2.0

The repo which looks derived from this `expiringdict` project: https://github.com/rfyiamcool/expiredict/blob/05900e6411015c177a82e1323a8e0aff7da717a8/expiredict/__init__.py And the same project submitted to PYPI: https://pypi.python.org/pypi/expiredict Potentially violated clauses include (but are not limited to): > 4 ....

Adding new option for falling back to clipboard pasting on character injection failures.

So that non-ASCII characters may be entered via IMEs. Resolves #37. Submitting the changes here for now for review. Further documentation work would be needed on [`README.md`](https://github.com/Genymobile/scrcpy/blob/56d237f152fc4d454a65fe594412b4bd9409fd64/README.md#input-control) and [`FAQ.md`](https://github.com/Genymobile/scrcpy/blob/56d237f152fc4d454a65fe594412b4bd9409fd64/FAQ.md#special-characters-do-not-work) either...

[Bug]: page.url() may occasionally miss URL changes via the History API

### Bug expectation # Overview Upon calling the History API (e.g. [the `pushStage` method](https://developer.mozilla.org/en-US/docs/Web/API/History/pushState)), a page's URL may be updated. Puppeteer is expected to catch such changes since it listens...

bug

confirmed

[idea] HttpCacheMiddleware could be further enhanced

The current design of `HttpCacheMiddleware`: - Checks whether a request hits the cache in `process_request` - Stores a response to the cache storage in `process_response` That makes it possible to...

enhancement

performance