Paul Tremberth

Results 81 comments of Paul Tremberth

@frisi , I'm really not sure how version 0.9.1 did it, but to me, something like `> .child` alone, i.e. starting with `>` is not a valid [CSS3 selector](https://www.w3.org/TR/selectors-3/#selectors). This...

parsel has `xpath_attr_functional_pseudo_element` which calls `function.argument_types()`. I haven't tested if this removal affects it See https://github.com/scrapy/parsel/blob/bcfa589e5f15b8f4230a699a0d64a3205732a317/parsel/csstranslator.py#L85

For the record, here's a tentative implementation using an XPath extension function with lxml: https://github.com/scrapy/parsel/pull/73

`:nth-child(An+B [of S]? )` seems to have replaced `:nth-match(An+B of )` [found in earlier versions of CSS4](https://www.w3.org/TR/2013/WD-selectors4-20130502/#the-nth-match-pseudo).

@dchaplinsky , @eLRuLL : I believe this can be done at middleware level indeed, perhaps with the same design as [the revamped robotstxt middleware](https://github.com/scrapy/scrapy/blob/129421c7e31b89b9b0f9c5f7d8ae59e47df36091/scrapy/downloadermiddlewares/robotstxt.py#L20), i.e. returning a deferred on `process_request`...

A good catch @pp-qq ! This also affects `safe_url_string()` (and would be fixed too with your change to `_safe_ParseResult()`) This change needs an update to the tests. Could you add...

@pp-qq , the issue with pypy on Travis is unrelated. I'm trying to fix it in #99

From my unscientific tests, with this page, ``` No title "/%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%1A,%1B,%1C,%1D,%1E,%1F", relative to base http://www.example.com/ "/%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%2A,%2B,%2C,%2D,%2E,%2F", relative to base http://www.example.com/ "/%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%3A,%3B,%3C,%3D,%3E,%3F", relative to base http://www.example.com/ "/%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%4A,%4B,%4C,%4D,%4E,%4F", relative to base http://www.example.com/...

Summary for Chrome vs. [canonicalize_url](url): ``` >>> from w3lib.url import canonicalize_url >>> >>> chrome_normalized = '''%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%1A,%1B,%1C,%1D,%1E,%1F ... %20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%2A,%2B,%2C,-,.,%2F ... 0,1,2,3,4,5,6,7,8,9,%3A,%3B,%3C,%3D,%3E,%3F ... %40,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O ... P,Q,R,S,T,U,V,W,X,Y,Z,%5B,%5C,%5D,%5E,_ ... %60,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o ... p,q,r,s,t,u,v,w,x,y,z,%7B,%7C,%7D,~,%7F''' >>> >>>...

For Firefox (48.0 Mozilla Firefox for Ubuntu) it's a bit different: "on the wire" as copied from the network panel: ``` http://www.example.com/%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%1A,%1B,%1C,%1D,%1E,%1F http://www.example.com/%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%2A,%2B,%2C,%2D,.,%2F http://www.example.com/%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%3A,%3B,%3C,%3D,%3E,%3F http://www.example.com/%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%4A,%4B,%4C,%4D,%4E,%4F http://www.example.com/%50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%5A,%5B,%5C,%5D,%5E,%5F http://www.example.com/%60,%61,%62,%63,%64,%65,%66,%67,%68,%69,%6A,%6B,%6C,%6D,%6E,%6F http://www.example.com/%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%7A,%7B,%7C,%7D,%7E,%7F ``` as...