safebrowsing
safebrowsing copied to clipboard
sbserver differs from online/browser lookup?
I have noticed that the sbserver returns an empty response for some urls while Chrome browser and online lookup tool ( https://www.google.com/transparencyreport/safebrowsing/diagnostic/ ) does return a correct danger response. I have looked and the server is updating its list. Anyone know what is happening?
A sample url for which this happens.
http://www.precision-mouldings.com/.ls/.https:/.www.paypal.co.uk/uk.web.apps.mpp.home.sign.in.country.a.GB.locale.a.en.GB-6546refhs8ehgf8-890b7fefut9546954543ds867hgf9-1egey3ds4820435t546ggc-u4ydstgu5438gjksssGB/plmgeo.php
Thanks for the bug report. We'll look into shortly.
Anything new on this?
Bumping this.... Anything new ?
I am having this exact same issue with the following URL:
https://www.google.com/transparencyreport/safebrowsing/diagnostic/#url=https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc
Just wanted to confirm that sblookup also reports this URL as safe:
| => echo "https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc%E2%80%9D" | sblookup -apikey '<redacted>'
safebrowsing: 2017/05/02 16:18:26 database.go:106: no database file specified
safebrowsing: 2017/05/02 16:18:30 database.go:336: database is now healthy
safebrowsing: 2017/05/02 16:18:30 safebrowser.go:504: Next update in 30m11s
Safe URL: https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc%E2%80%9D
Plus, this is the output of the test as indicated in the README file:
| => go test github.com/google/safebrowsing -v -run TestSafeBrowser -apikey '<redacted>'
=== RUN TestSafeBrowser
--- PASS: TestSafeBrowser (0.78s)
PASS
ok github.com/google/safebrowsing 0.933s
Finally, I can confirm that there no problem with my API key since I can successfully query this URL using https://github.com/afilipovich/gglsbl on the same machine:
| => python
Python 2.7.13 (default, Dec 18 2016, 07:03:39)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gglsbl import SafeBrowsingList
>>> sbl = SafeBrowsingList('<redacted>')
>>> sbl.update_hash_prefix_cache()
>>> sbl.lookup_url('https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc')
[SOCIAL_ENGINEERING/OSX/URL]
With further testing I noticed that when I specified a database file for sbserver and sblookup, the created file is only 6 megabytes. In comparison, the gglsbl Python module creates a local sqlite database that is over 1.4 gigs in size.
So maybe what's happening here is that the go client is silently failing to download and/or save the hash database locally.
Just wanted to share that the lack of feedback on this issue has led me to file this repository under "abandonware".
I am using https://github.com/afilipovich/gglsbl instead. It works great, is fast and the author is very very responsive to reported issues. Would recommend that @serpiente, @gliwka and @Heavenwalker take a look at this alternative too if they haven't found another already.
@asieira Thanks for the hint! Unfortunately I need the REST api, altough it should be possible to combine gglsbl with flask to get there.
@dsnet @colonelxc Any progress on this? Sbserver isn't working correctly at this point and the worst part is that it's failing silently! This could leave applications depending on it and their Users vulnerable!
/cc: @alexwoz
I have actually built a Flask + gunicorn dockerized REST server on top of gglsbl and was planning on open sourcing it. Would that help?
@asieira Sure, that would be amazing :-)
I do not work in this team anymore, but I can assure you that this project is not abandonware.
Hi everyone,
Thank you for all of your contributions to this repo and your patience while we investigated -- based on your reports/comments we've been able to clarify the issue.
As part of our API, some clients receive a different list of threats due to data sharing restrictions. This is why you may see discrepancies between the Go client and Safe Browsing-enabled browsers like Chrome. Upon investigating the bugs filed in this repo, we realized that there was a different problem afoot - a bug on the server-side - that will be patched in the coming weeks.
Thanks, Alex
@asieira any updates?
Finally published the repo I had talked about before, you can find it at https://github.com/mlsecproject/gglsbl-rest if you want to try it out. Any comments and suggestions are most welcome.
@alexwoz @colonelxc Any update on this issue? It's been a year, since this issue has been created.
@gliwka This issue should be resolved. Please update this bug if you continue to experience any inconsistencies.
I'm running into the same issues described by other users who commented earlier in this issue thread. Notably, if I use https://transparencyreport.google.com/safe-browsing/search to search for a known malware URL such as 999fitness.com I'm correctly told "Some pages on this site are unsafe".
Yet when I use Postman/cURL/sblookup to classify 999fitness.com I receive an "empty" 200 response, indicating there is nothing wrong with the URL.
When I the Google API Explorer (https://developers.google.com/apis-explorer/?hl=en_US#p/safebrowsing/v4/safebrowsing.threatMatches.find) to classify the same URL, it just "spins" endlessly. As of right now the explorer has been running for 23 minutes without actually returning a response.
Reviewing the Google Cloud Platform API monitor, I'm told everything is just fine, and every one of my queries returned a 200.
I was going to post a question on the Google Safe Browsing API forum (https://groups.google.com/forum/#!forum/google-safe-browsing-api) but ironically it is full of spam.
Not complaining; just trying to figure out what exactly is going on with this service.
Jason
@wjgilmore
-
I see the same problem as you with the API Explorer. I have created an internal bug with the applicable team.
-
Regarding the transparency report, as compared to the safebrowsing lookup, there are some slight differences in utility and function. It is best explained with an example.
| URL | API lookup | Transparency Report |
|---|---|---|
| foo.com | Safe | Some pages unsafe |
| foo.com/bad/ | Malware | This page unsafe/Malware |
| foo.com/bad/baz/ | Malware | This page unsafe/Malware |
| foo.com/good/ | Safe | Safe |
Essentially the API is focused on answering the question, "Do we think it is safe to go to this site right now?". For foo.com, it is. The malware was on a different (more specific) path (or subdomain). This often happens when a site has been hacked. The attacker will add their own content and redirect users from other sites to the specific path/subdomain. This sometimes has no impact on the rightful content of the site, and so we try to minimize the scope of what is blocked to only the paths that will actually try to infect you.
The transparency report does API-style checks, but it also checks if there are more specific paths/subdomains that are known to be bad. So for the second and third URLs, it is responding the same as the API does. For the first URL, it knows that there are more specific paths that are known to be bad. So it says some pages are unsafe, even though foo.com is fine to visit on its own.
Does that help?
Hi @wjgilmore,
Thanks for your message, and apologies for the confusion. I can see why the Transparency Report wording and Safe Browsing API responses appear to contradict one another. The Transparency Report communicates the extent to which the provided site is bad; in this case, the site is only "partially" bad ("Some pages on this site..."). The Safe Browsing API, however, will only return a verdict when the provided URL is definitively bad; i.e. we have determined that all URLs (including the root domain) are not unsafe for a user to access.
Hopefully that makes sense!
Alex
Hi @colonelxc and @alexwoz Thank you both for these detailed explanations. To summarize:
- The Transparency Report is useful for determining whether a URL (and it's associated siblings/children/parents/grandparents) is "safe".
- The Safe Browsing API is useful for determining whether a specific URL is safe.
Is my understanding correct? Our project attempts to determine whether any URLs found in an incoming text message contain potentially dangerous links (phishing, malware, etc). We were under the impression the Safe Browsing API would offer an ideal solution. However it is certainly possible the URL found in a text message would be "safe" yet ultimately lead the unsuspecting user to a subsequently dangerous endpoint. So it sounds like we're going to have to look for an alternative solution.
Thanks again, I really appreciate your time.
Jason
Hey @wjgilmore,
As @colonelxc mentioned, the Safe Browsing API answers the question of whether the provided URL is safe for a user to access at this time. Your use case sounds very well-suited for this check. The Safe Browsing lists are intended to contain URL expressions from various points of the navigation, including those that users receive links to (e.g. through an SMS). If the initial URL redirects a user to an unsafe endpoint, then there's a good chance that the initial URL and those of subsequent navigations are all on a Safe Browsing list.
Hopefully that addresses some of your concerns.
Alex
@alexwoz @colonelxc I'm finding differences between the Safe Browsing API (what's returned from running the sbserver) and what's on https://transparencyreport.google.com as well.
The transparency report is saying that the url is unsafe but sbserver is returning an empty response.

Found another:

Is it possible that results from the API are more up to date than https://transparencyreport.google.com or are they using the same api?
Thanks @summera
Yeah, I saw such discrepancy in the past but I cannot tell which source is more up to date as I am not affiliated with Google. Transparency report states "This info was last updated on Apr 1, 2018."
@afilipovich Thanks for the response! Very weird. So have you or anyone else been able to determine how accurate this is in a real world production environment? It seems to me, based on what's been reported in this issue and the google group and with my own simple tests, that there are a lot of false negatives being returned from the API. Since phishing and malware urls are constantly changing it's challenging to determine whether this is really going to catch much and how accurate it will be.
Due to data sharing restrictions, the set of URLs accessible via the Safe Browsing API, Transparency Report, and web browser integrations may differ. It is our goal to ensure these discrepancies are as rare as possible, but it's not guaranteed.
I think any detection technology will have false negatives, no solution can claim to catch everything. So that is something we should already expect.
In particular, it seems to me the Google Safebrowing API must be removing malicious entries from their database either through an aging process or by detection of when they are no longer active. In any case, I will take a solution that does that to minimize false positives over a very noisy one every time.
You can try to compare results from gglsbl with Google Safe Browsing Lookup API.
https://developers.google.com/safe-browsing/v4/lookup-api
It does not use local cache so it has performance limitations, but it excludes possible issues with gglsbl client code.
which database is specified in the database.go file line number 110 ?