httpswitchboard icon indicating copy to clipboard operation
httpswitchboard copied to clipboard

Explanation of why a site is blocked

Open hrj opened this issue 10 years ago • 21 comments

When I browse to getsatisfaction.com, I see this message: "Blocked by HTTPSB". The entire frame/window seems to have been blocked. I tried searching for an explanation but couldn't find one.

Can you tell me why this is blocked? I assume it is a default -- I don't remember setting it myself.

Down the road, would be nicer if there was an explanation provided in the browser itself.

Thanks!

hrj avatar Apr 28 '14 14:04 hrj

I see the site is blacklisted by http://hosts-file.net/default.asp?s=getsatisfaction.com: "Ad/tracking servers"

These are 3rd-party lists out of control of HTTPSB, but used by HTTPSB for convenience purpose, so there is no way for me to provide an explanation of why one particular site is blocked by one particular list, only the maintainer of the list can explain why he/she put the entry in there.

http://hosts-file.net though is a good tool for users who want to find out about a site, but I can't provide a hook in HTTPSB to this one particular site or any other, I do not want HTTPSB to depend on any specific external site during its operation.

gorhill avatar Apr 28 '14 14:04 gorhill

Thanks for the info.

I appreciate your decision not to link to a specific site. Though, if HTTPSB is using a third-party list, I think it kind of makes sense to link to explanations from a third-party as well.

If that is not possible, and in the meanwhile, could the UI be changed to atleast indicate the source of the decision to block: user-requested or blacklisted by a 3rd-party, etc. When I first saw the "blocked by HTTPSB" message, I was not sure if I had blocked it myself (accidentally or otherwise).

hrj avatar Apr 28 '14 14:04 hrj

If that is not possible, and in the meanwhile, could the UI be changed to atleast indicate the source of the decision to block

I will have to think hard about this one. Many issues to consider.

For one, a single hostname can be present in multiple lists. Also, the way the blacklisted entries are encoded in memory is highly memory&cpu efficient, and to keep a backward reference to where they come from would be a huge price in memory footprint. And by huge I mean huge, defeating in one swoop all the long hours I put in to come up with efficient algorithms, and removing one of the main benefit to users of using HTTPSB over alternative solutions.

One possible solution could be to search an entry in the original data, but obviously it's not something which can happen in real time, it would be an expensive operation.

gorhill avatar Apr 28 '14 14:04 gorhill

Just penning some ideas (with zero knowledge of the code):

  1. Could this be treated as a special case: window.location itself getting blocked.
  2. If modifying the real-time blocking algorithm is not feasible, then a separate algorithm just for this case could be used.

hrj avatar Apr 28 '14 15:04 hrj

What do you mean by "this case"?

gorhill avatar Apr 28 '14 15:04 gorhill

I meant the case of the entire site getting blocked, that is window.location itself getting blocked, not the related requests (images, AJAX, etc).

Not sure if this makes sense code-wise, but from the user's perspective, a case like getsatisfaction.com getting blocked completely (with no text rendered at all) looks very different and disorienting than just scripts being disabled.

hrj avatar Apr 28 '14 16:04 hrj

a case like getsatisfaction.com getting blocked completely (with no text rendered at all) looks very different and disorienting than just scripts being disabled.

Ok, I think I get what you are saying. You suggest that all hosts appearing in ubiquitous blacklists still have their main page load properly, thus mainly adopting ABP's view that the main doc of blacklisted sites is allowed to load regardless (there is a dated blog entry somewhere on ABP site regarding the rationale, I can't readily find it).

I disagree.

The primary purpose of these external blacklists is to be used in a computer's hosts file, which purpose is to block requests of any kind to these blacklisted hosts. Allowing the main page to load means allowing at least one request to reach the blacklisted host, thus disregarding the original purpose of the host files. I say "at least one" because the result of such one request may be a redirect directive to another host etc.

Note that I have seen comments here and there of users wishing the opposite of the default behavior seen in ABP, i.e. that all requests to a blocked hostname should be blocked, including the main page (the current behavior of HTTPSB).

I know I am stubborn often, but really it is because I want the best for users, and in the current case my view is that if the hostname is blacklisted, no request whatsoever should reach the blacklisted server.

Now I suppose I could provide yet-another setting to soften that behavior and allow at least the main page while blocking everything else, invluding whitelisted types (img, css out of the box). But for the time being I will wait for a lot more feedback before any core changes (~~that would demand a bit of code change of course and added overhead~~ Edit: no, that would be easy).

gorhill avatar Apr 28 '14 17:04 gorhill

You suggest that all hosts appearing in ubiquitous blacklists still have their main page load properly

No! I was not suggesting any change of functionality. Just suggesting that the UI could indicate the reason for blocking the main doc.

hrj avatar Apr 28 '14 17:04 hrj

Then I am back to not understanding what you have in mind with the suggestions "window.location itself getting blocked" and "separate algorithm just for this case". Can you elaborate?

gorhill avatar Apr 28 '14 17:04 gorhill

I think the special case he wants is an explanation when a rootFrameReplacement in traffic.js is used. Why the main_frame was blocked.

my-password-is-password avatar Apr 28 '14 18:04 my-password-is-password

I think the special case he wants is an explanation when a rootFrameReplacement in traffic.js is used.

We need to define very precisely "explanation".

So far what I can say is: Only the author of the list holds the real explanation of why a particular hostname is on his list.

As for HTTPSB and as discussed, I could theoretically provide which list(s) contain(s) the offending hostname, but a feature that would come at a huge cost to memory if real-time, or CPU cost if async (need to reload each list and see if there is a match).

Then what is left is, is this really worth the price in memory and/or CPU and added code complexity to deal differently than the current way of redirecting a main doc to a custom (but safe) main doc? (and keeping in mind I don't think it is something occurring very often really).

It's easier for me to deal with concrete and specific solutions (better if these take into account the consequences as well), I feel I am spending too much time second (mis-)guessing what exactly is wanted.

gorhill avatar Apr 28 '14 18:04 gorhill

Thanks for your patience @gorhill

I am commenting purely from a user-perspective. I am not familiar with code or design of HTTPSB.

What @my-password-is-password said sounds about right. When the main frame itself is blocked and replaced, could an explanation be provided?

About the content of the explanation, anything you can provide to give some context to the user is fine, IMO.

About performance, again I am not sure how HTTPSB works, but can't it lazily show an explanation or a "show more information" button when a frame is replaced? If that is possible then I suppose there is no need to modify the real-time filter algorithm.

If this is sounding vague, nevermind. Like you said, so far I haven't encountered this frequently enough to be a major concern.

hrj avatar Apr 28 '14 19:04 hrj

Given that most preset block lists are mere list of just hostnames, I can't programmatically explain why something is blocked.

gorhill avatar May 11 '14 04:05 gorhill

@gorhill I don't want to drag this too much and thanks for your time.

Just a final attempt to see if I have been understood: I don't want to know "why" something is blocked at a fine-grained level. I am not looking for something like this: "xyz.com is blocked because of ad-tracking"

I just would like to see something like this: "xyz.com is blocked from a preset block list"

Otherwise, I wouldn't know if I had blocked it by my own past actions.

hrj avatar May 11 '14 05:05 hrj

I just would like to see something like this: "xyz.com is blocked from a preset block list"

Ok. I did prototype in the past some visuals for preset block hosts (darker red), but I scrapped the prototype. I don't remember if I found issues with the idea, I am not sure the exact reasons why I scrapped the prototype.

I think it would help to have help to test that kind of prototype, the project has grown beyond what I can handle test-wise, to witness how I broke the session cookies and the ABP filtering this week.

gorhill avatar May 11 '14 05:05 gorhill

Could you still block/replace the main_frame and show a matrix in the popup? I would be 3 clicks away from an explanation if I really wanted to know. :)

mainframeblocked2

my-password-is-password avatar May 11 '14 09:05 my-password-is-password

@my-password-is-password joking aside, for main frame, I could put a link to hpHosts for the offending hostname in the replacement frame. It's not guarantee hpHosts will have something about the entry, but I believe they might be the most comprehensive DB out there.

gorhill avatar May 11 '14 14:05 gorhill

How about this:

to-forum

This doesn't address the issue "xyz.com is blocked from a preset block list", this one I need to investigate more as it would require a core function of HTTPSB to return a new value, and that means I have to track all the callers' of that function to be sure the side-effect of a new return value does not break them.

gorhill avatar May 11 '14 14:05 gorhill

@gorhill, I didn't know that it already showed a matrix for a blocked main_frame in stable Chrome. I really need to stop using dev build of Chromium. Sorry

my-password-is-password avatar May 11 '14 18:05 my-password-is-password

I didn't know that it already showed a matrix for a blocked main_frame in stable Chrome. I really need to stop using dev build

Ok, I thought you were joking. Which version of dev build are you using? I ask because maybe there are changes in the pipeline which will break the matrix for when a data URI is used, and this is what you are witnessing.

gorhill avatar May 11 '14 18:05 gorhill

Chromium 37.0.1987.0 (269701)

my-password-is-password avatar May 11 '14 18:05 my-password-is-password