oniguruma
oniguruma copied to clipboard
The results of a Google search for "oniguruma" are crazy! (in Japan)
If you do a Google search for the keyword "oniguruma" you'll see some very strange results. The first few links that appear on the first page are related to the keyword oniguruma, but the rest of the pages are mostly made up of completely unrelated links. I noticed this last August. However, this may be the case only in Japan. I don't know what is going on in other parts of the world.
The rest of this article is written below. https://kkos.fc2.net/blog-entry-1.html
The attack on Google search is still going on. I still don't know what it looks like outside of Japan. When I registered this Issue, the behavior changed a little, so I think the criminal is looking at this page.
From San Francisco:
Thank you. For the first time, I was able to learn how things look outside of Japan. At least there doesn't seem to be anything weird in the first page. In my environment, a dozen or so pages of mostly irrelevant stuff are displayed.
Portugal. Page count is under 100k instead of 1.14M
The number of searches in Japan is close to that. It turns out that there will be little impact outside of Japan.
This is how it looks today in Colombia.
Most of the unrelated pages I see here are in Japanese, so it seems to be fine for non-Japanese areas. If you don't see any unfamiliar or unusual characters (Japanese characters: kanji, hiragana, etc.) within the first few pages, you should be fine. This may be due to the fact that the culprits are in Japan, where they are mechanically manipulating clicks to increase their rankings.
In Canada the results on Google and DuckDuckGo are similar to those posted above. Looks fine to me.
The search results also look fine for me in Los Angeles, CA 👍
Works fine in Sydney, Australia
From Bulgaria:
Works fine for me in Tokyo, Japan.
I don't think so. Even here, the first six or seven of the first page will be the relevant pages. But after that, it's mostly filled with irrelevant pages for more than ten pages. In other words, most of what comes up in a search is irrelevant links. In your image, "すること。7 記載容量6は、営業外収益の「その他」" and "70 花粉発生源対策推進事業" probably have nothing to do with Oniguruma.
Hi there~ Here's the result for me, seems fine? I from China and use a global network _(:3
I'm not sure if you've tried changing Google's search settings? There're sth about the region and languages for the search result ...
btw, I used Singapore as the region setting (for some sorry reason what I won't to entangled in), and I set the languages for the search result to 简体中文、繁體中文、English and 日本語。
Thank you. I am convinced that the Japanese search results are abnormal and that the non-Japanese search results are normal. I just checked the contents of the two links in the previous example by @mmizutani.
- https://www.pref.kochi.lg.jp/soshiki/170201/files/2020090400267/2013031500317_www_pref_kochi_lg_jp_uploaded_attachment_89201.pdf
- https://www.maff.go.jp/j/budget/attach/pdf/171222_2-65.pdf
Neither of them contains the strings "Oniguruma" or "鬼車", and neither of them has anything to do with Oniguruma. Moreover, this is the result of the first page, and the next pages are full of irrelevant links. Although @mmizutani hasn't produced a second page, I'm convinced of that from my own results. I have no idea about the impact of where you search.
Confirmed, After I tried changing the region to Japan, the search results showed these completely unrelated items ... Trying to find the reason
seems changed the search options to 完全一致 from tools would help,and notice that most those things are PDF Doc?
I have a guess about it... Weather is it possible that Google parsed all that content into romanization and then split it to match and lead to this...
You're right, most of the irrelevant links are PDFs. But not all of them, maybe 60%. When I set it to exact match, the irrelevant links disappeared. That doesn't mean that the cause isn't an attack.
It fine from Vietnam.
Zip file of screenshots ( canada, france, indonesia, Taiwan ) 9.47MB https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/234ver3.zip
I looked at your search results. I used to think that the results only depended on the language, but now I know that it depends on the language and the location. In other words, the results are terrible when you search in Japan specifying Japanese, and not so terrible otherwise. However, your search also showed that the effects of this attack are not entirely absent outside of Japan. Some examples are shown below. These have nothing to do with Oniguruma. And these are links that I have seen many times.
france_ja_p6, indonesia_ja_p7, Taiwan_ja_p6
円行東自治会 - FC2
https://engyouhigashi.web.fc2.com/inout-hiritu.html
canada_ja_p6, france_ja_p4, indonesia_ja_p4, Taiwan_ja_p3
持 続 可 能 な 医 療 保 険 制 度 を 構 築 す る た め の 国 民 健
https://www.sangiin.go.jp/japanese/gianjoho/ketsugi/189/f069_052601.pdf
france_ja_p6, indonesia_ja_p5, Taiwan_ja_p4
食品流通合理化促進事業
https://www.maff.go.jp/j/shokusan/sijyo/info/attach/attach/pdf/sijyou_yosan2-9.pdf
canada_ja_p7, france_ja_p7, indonesia_ja_p6
お 困 り の 方 へ 騒 音 や 悪 臭 な ど で
https://www.city.tochigi-sakura.lg.jp/manage/contents/upload/61bb4ab4dbec7.pdf
In canada_ja, p7 is more of irrelevant links. indonesia_ja is more of irrelevant links from p6. Taiwan_ja is more of irrelevant links from p4.
@tonco-miyazawa,
I would like to know what happens to the "other keywords" that appear below the results when I search for oniguruma and specify a time period of 24 hours or less.
Here are the results I just ran (in Japan, in Japanese)
These bullshit words have been showing up at a high rate for nearly two years.
24 hours "other keywords" SS https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/24hOnlyJP_20220409at23h58m.png https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/24hOnlyJP_20220410at11h17m.png
SS zip part2 (2.39MB) Main: Thailand_CN https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/onig_SS2.zip
@tonco-miyazawa I did not notice the April 15 addendum until today.
I wrote a rebuttal on my blog. (In both English and Japanese.). https://kkos.fc2.net/blog-entry-2.html
After a re-investigation, I found that my idea was wrong. 再調査をしたところ、私の考えが間違っていたことが分かりました
I deleted the previous remarks. 私は以前の発言を削除しました
I'm sorry about that remark. ご迷惑をおかけしてすみませんでした
google is trying to show you most relevant information in your language based on your ip / location or preferences(if you are signed-in)
You can ask google to show results found in other language or multiple languages:
https://www.google.com/search?q=Oniguruma&lr=lang_ja|lang_en
英語 と 日本語のページを検索 ( プライバシーモード ) (click me)
japanese + english:
lr=lang_ja|lang_en
english:
lr=lang_en
?) lang_XX
(言語(lr)の収集値)
https://developers.google.com/custom-search/docs/xml_results_appendices#languageCollections
?) lr
Language Restriction (言語制限)
https://developers.google.com/custom-search/docs/xml_results#lrsp
?) hl
(インターフェース言語 )
https://developers.google.com/custom-search/docs/xml_results#hlsp
IMHO It is not that you "attacked", it is simply that keyword is less popular/cited than it's japanese counterpart.
It is ofthen desireable to search for english results only, especially in programming...
You can add new search engine and make it Default (click me)
@Befzz If you read and understand the following two, I don't think you would make such a claim. https://kkos.fc2.net/blog-entry-1.html https://kkos.fc2.net/blog-entry-2.html
I didn't want to write the same thing twice, so I wrote another article. https://kkos.fc2.net/blog-entry-3.html
Have you taken into account the effect of the personalized search algorithms used by Google?
Did you not read my first entry? https://kkos.fc2.net/blog-entry-1.html
I don't think this is because Google is displaying customized results for users. The reason is that searching in Chrome's incognito mode did not make any difference.
I've heard that in incognito mode (or secret mode?), the search results will not be personalized.
And by @tonco-miyazawa https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/onig_SS2.zip The following two files in this archive show the same strange related keywords I saw.
Japan_aichi_JP_24h.png Japan_hokkaido_JP_24h.png
(* But since it is in Japanese, I don't think you would know what it is when you see it.)
I randomly found this issue when looking up regex engines from Wikipedia. Guess advertising on front page works :)
I was able to reproduce in Japan. More importantly my friend at Google could too. It looks like a search bug so hope it gets fixed.
My hypothesis is the romaji gets converted to kanji 鬼車, but maybe split into two tokens 鬼 and 車. Especially the latter will retrieve a lot of unrelated pages, just need something like 車でお越しの方 somewhere.
The issue doesn't reproduce outside Japanese because the conversion from romaji to kanji is probably disabled elsewhere.
I suspect the attacker is a software bug and hope it gets squashed! I think we all know how hard CJK can be to get right ;)