nightmare-load-filter icon indicating copy to clipboard operation
nightmare-load-filter copied to clipboard

Urls not listed in filter still pass through?

Open coodoo opened this issue 8 years ago • 6 comments

First of all thanks for making all these great things happen, big kudos!

Just did a quick run and seemed urls not listed in the filter still got passed to the fn.

See below, edgesuite.net is not listed in the filter I would assume it shouldn't got passed into fn at all, am I doing something wrong here?

this.browser
.filter({
    urls: ['https://*.github.com/*', '*://electron.github.io']
  }, function(details, cb){
    // a request to http://img.edgesuite.net/foo.png got passed in and blocked, which shouldn't
    return cb({cancel: (details.url.indexOf('edgesuite.net') !== -1 )});
  })
.goto( url )

coodoo avatar May 29 '16 03:05 coodoo

@coodoo What version of Nightmare and nightmare-load-filter are you using, out of curiosity?

I tried your example, and if you add logging in the filter callback, it doesn't look like it gets called. How are you asserting that the image is getting blocked? (The URL provided returns a 502.)

rosshinkley avatar May 30 '16 15:05 rosshinkley

@rosshinkley I'll provide detailed report soon, quick question: how do I log in the filter callback? I tried the standard console.log('foo') to no avail.

coodoo avatar May 30 '16 23:05 coodoo

Here's a short code sample to reproduce the issue, edgesuite.net is not listed in the rules, yet all images from that website were blocked, I would expect that should not happen?

const rules = [
    'google.com',
    // 'edgesuite.net'
]

this.browser
    .filter( { urls: rules }, ( details, cb ) => cb({ cancel: details.url.indexOf('edgesuite.net') != -1 }) )
    .goto( 'http://www.appledailytw.com/realtimenews/article/nextmag/20160531/874328/' )

Using:

"nightmare": "^2.5.0",
"nightmare-load-filter": "0.2.0",

coodoo avatar May 30 '16 23:05 coodoo

I tried the standard console.log('foo') to no avail.

Output will be a part of the Electron stdout. Run your script with DEBUG and you'll have better luck.

...yet all images from that website were blocked, I would expect that should not happen?

That's odd. Maybe this is a quirk of later versions of Electron or Chromium - I would expect whole matches (eg, http://www.google.com) to match only that address, but it looks like that filter is completely ignored. In fact, I'd expect it to behave how WebRequest match patterns work. I'll dig into this as time permits.

It looks like it works as expected if you are willing to use wildcards. Your example, slightly modified:

const rules = [
    'http://google.com/*',
    // 'edgesuite.net'
]

this.browser
    .filter( { urls: rules }, ( details, cb ) => cb({ cancel: details.url.indexOf('edgesuite.net') != -1 }) )
    .goto( 'http://www.appledailytw.com/realtimenews/article/nextmag/20160531/874328/' )

rosshinkley avatar May 31 '16 02:05 rosshinkley

Very interesting findings! After playing with it a bit more I found the url must contain :// and / after the domain, so something like *://google.com/* works, any other form won't.

coodoo avatar May 31 '16 02:05 coodoo

That doesn't surprise me as much: google.com is ambiguous and should be "fully qualified" (even if the full qualification is with wildcards, it's required to be explicit about what you expect). I can kind of understand why that wouldn't work.

rosshinkley avatar May 31 '16 02:05 rosshinkley