oniguruma icon indicating copy to clipboard operation
oniguruma copied to clipboard

The results of a Google search for "oniguruma" are crazy! (in Japan)

Open kkos opened this issue 3 years ago • 40 comments

If you do a Google search for the keyword "oniguruma" you'll see some very strange results. The first few links that appear on the first page are related to the keyword oniguruma, but the rest of the pages are mostly made up of completely unrelated links. I noticed this last August. However, this may be the case only in Japan. I don't know what is going on in other parts of the world.

The rest of this article is written below. https://kkos.fc2.net/blog-entry-1.html

kkos avatar Apr 30 '21 14:04 kkos

The attack on Google search is still going on. I still don't know what it looks like outside of Japan. When I registered this Issue, the behavior changed a little, so I think the criminal is looking at this page.

kkos avatar May 27 '21 14:05 kkos

From San Francisco: Screen Shot 2021-06-03 at 19 23 22

dbqpdb avatar Jun 04 '21 02:06 dbqpdb

Thank you. For the first time, I was able to learn how things look outside of Japan. At least there doesn't seem to be anything weird in the first page. In my environment, a dozen or so pages of mostly irrelevant stuff are displayed.

kkos avatar Jun 04 '21 04:06 kkos

Portugal. Page count is under 100k instead of 1.14M oniguruma

ruigazio avatar Jun 04 '21 14:06 ruigazio

The number of searches in Japan is close to that. It turns out that there will be little impact outside of Japan.

kkos avatar Jun 05 '21 02:06 kkos

This is how it looks today in Colombia.

oniguruma - Google Search_Página_1

andreseduardop avatar Jun 07 '21 20:06 andreseduardop

Most of the unrelated pages I see here are in Japanese, so it seems to be fine for non-Japanese areas. If you don't see any unfamiliar or unusual characters (Japanese characters: kanji, hiragana, etc.) within the first few pages, you should be fine. This may be due to the fact that the culprits are in Japan, where they are mechanically manipulating clicks to increase their rankings.

kkos avatar Jun 08 '21 12:06 kkos

In Canada the results on Google and DuckDuckGo are similar to those posted above. Looks fine to me. image

SergioInToronto avatar Jun 08 '21 15:06 SergioInToronto

The search results also look fine for me in Los Angeles, CA 👍

Gerst20051 avatar Jun 09 '21 01:06 Gerst20051

Works fine in Sydney, Australia

SamuelMarks avatar Jul 30 '21 09:07 SamuelMarks

From Bulgaria: image

kanevbg avatar Aug 09 '21 07:08 kanevbg

Works fine for me in Tokyo, Japan. Screen Shot 2021-09-12 at 21 58 41

mmizutani avatar Sep 12 '21 12:09 mmizutani

I don't think so. Even here, the first six or seven of the first page will be the relevant pages. But after that, it's mostly filled with irrelevant pages for more than ten pages. In other words, most of what comes up in a search is irrelevant links. In your image, "すること。7 記載容量6は、営業外収益の「その他」" and "70 花粉発生源対策推進事業" probably have nothing to do with Oniguruma.

kkos avatar Sep 12 '21 13:09 kkos

image

Hi there~ Here's the result for me, seems fine? I from China and use a global network _(:3

I'm not sure if you've tried changing Google's search settings? There're sth about the region and languages for the search result ...

btw, I used Singapore as the region setting (for some sorry reason what I won't to entangled in), and I set the languages for the search result to 简体中文、繁體中文、English and 日本語。

HeveraletLaidCenx avatar Sep 23 '21 09:09 HeveraletLaidCenx

Thank you. I am convinced that the Japanese search results are abnormal and that the non-Japanese search results are normal. I just checked the contents of the two links in the previous example by @mmizutani.

  • https://www.pref.kochi.lg.jp/soshiki/170201/files/2020090400267/2013031500317_www_pref_kochi_lg_jp_uploaded_attachment_89201.pdf
  • https://www.maff.go.jp/j/budget/attach/pdf/171222_2-65.pdf

Neither of them contains the strings "Oniguruma" or "鬼車", and neither of them has anything to do with Oniguruma. Moreover, this is the result of the first page, and the next pages are full of irrelevant links. Although @mmizutani hasn't produced a second page, I'm convinced of that from my own results. I have no idea about the impact of where you search.

kkos avatar Sep 23 '21 11:09 kkos

Confirmed, After I tried changing the region to Japan, the search results showed these completely unrelated items ... Trying to find the reason

HeveraletLaidCenx avatar Sep 23 '21 11:09 HeveraletLaidCenx

image

seems changed the search options to 完全一致 from tools would help,and notice that most those things are PDF Doc

HeveraletLaidCenx avatar Sep 23 '21 11:09 HeveraletLaidCenx

I have a guess about it... Weather is it possible that Google parsed all that content into romanization and then split it to match and lead to this...

HeveraletLaidCenx avatar Sep 23 '21 11:09 HeveraletLaidCenx

You're right, most of the irrelevant links are PDFs. But not all of them, maybe 60%. When I set it to exact match, the irrelevant links disappeared. That doesn't mean that the cause isn't an attack.

kkos avatar Sep 23 '21 12:09 kkos

It fine from Vietnam.

image

kocoten1992 avatar Mar 19 '22 15:03 kocoten1992

Zip file of screenshots ( canada, france, indonesia, Taiwan ) 9.47MB https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/234ver3.zip

tonco-miyazawa avatar Apr 06 '22 04:04 tonco-miyazawa

I looked at your search results. I used to think that the results only depended on the language, but now I know that it depends on the language and the location. In other words, the results are terrible when you search in Japan specifying Japanese, and not so terrible otherwise. However, your search also showed that the effects of this attack are not entirely absent outside of Japan. Some examples are shown below. These have nothing to do with Oniguruma. And these are links that I have seen many times.

france_ja_p6, indonesia_ja_p7, Taiwan_ja_p6
円行東自治会 - FC2
https://engyouhigashi.web.fc2.com/inout-hiritu.html

canada_ja_p6, france_ja_p4, indonesia_ja_p4, Taiwan_ja_p3
持 続 可 能 な 医 療 保 険 制 度 を 構 築 す る た め の 国 民 健
https://www.sangiin.go.jp/japanese/gianjoho/ketsugi/189/f069_052601.pdf

france_ja_p6, indonesia_ja_p5, Taiwan_ja_p4
食品流通合理化促進事業
https://www.maff.go.jp/j/shokusan/sijyo/info/attach/attach/pdf/sijyou_yosan2-9.pdf

canada_ja_p7, france_ja_p7, indonesia_ja_p6
お 困 り の 方 へ 騒 音 や 悪 臭 な ど で
https://www.city.tochigi-sakura.lg.jp/manage/contents/upload/61bb4ab4dbec7.pdf

In canada_ja, p7 is more of irrelevant links. indonesia_ja is more of irrelevant links from p6. Taiwan_ja is more of irrelevant links from p4.

@tonco-miyazawa, I would like to know what happens to the "other keywords" that appear below the results when I search for oniguruma and specify a time period of 24 hours or less. Here are the results I just ran (in Japan, in Japanese) Screen shot 2022-04-09 22 43 13 These bullshit words have been showing up at a high rate for nearly two years.

kkos avatar Apr 09 '22 13:04 kkos

24 hours "other keywords" SS https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/24hOnlyJP_20220409at23h58m.png https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/24hOnlyJP_20220410at11h17m.png


SS zip part2 (2.39MB) Main: Thailand_CN https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/onig_SS2.zip

tonco-miyazawa avatar Apr 10 '22 05:04 tonco-miyazawa

@tonco-miyazawa I did not notice the April 15 addendum until today.

I wrote a rebuttal on my blog. (In both English and Japanese.). https://kkos.fc2.net/blog-entry-2.html

kkos avatar May 01 '22 03:05 kkos

After a re-investigation, I found that my idea was wrong. 再調査をしたところ、私の考えが間違っていたことが分かりました

I deleted the previous remarks. 私は以前の発言を削除しました

I'm sorry about that remark. ご迷惑をおかけしてすみませんでした

tonco-miyazawa avatar May 02 '22 13:05 tonco-miyazawa

google is trying to show you most relevant information in your language based on your ip / location or preferences(if you are signed-in)

You can ask google to show results found in other language or multiple languages:

https://www.google.com/search?q=Oniguruma&lr=lang_ja|lang_en

英語 と 日本語のページを検索 ( プライバシーモード ) (click me)

image

japanese + english:
lr=lang_ja|lang_en 

english:
lr=lang_en

?) lang_XX (言語(lr)の収集値)
https://developers.google.com/custom-search/docs/xml_results_appendices#languageCollections

?) lr Language Restriction (言語制限)
https://developers.google.com/custom-search/docs/xml_results#lrsp

?) hl (インターフェース言語 )
https://developers.google.com/custom-search/docs/xml_results#hlsp

IMHO It is not that you "attacked", it is simply that keyword is less popular/cited than it's japanese counterpart.

It is ofthen desireable to search for english results only, especially in programming...

You can add new search engine and make it Default (click me)

image

Befzz avatar Nov 25 '22 20:11 Befzz

@Befzz If you read and understand the following two, I don't think you would make such a claim. https://kkos.fc2.net/blog-entry-1.html https://kkos.fc2.net/blog-entry-2.html

I didn't want to write the same thing twice, so I wrote another article. https://kkos.fc2.net/blog-entry-3.html

kkos avatar Nov 27 '22 03:11 kkos

Have you taken into account the effect of the personalized search algorithms used by Google?

Adirelle avatar Dec 09 '22 12:12 Adirelle

Did you not read my first entry? https://kkos.fc2.net/blog-entry-1.html

I don't think this is because Google is displaying customized results for users. The reason is that searching in Chrome's incognito mode did not make any difference.

I've heard that in incognito mode (or secret mode?), the search results will not be personalized.

And by @tonco-miyazawa https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/onig_SS2.zip The following two files in this archive show the same strange related keywords I saw.

Japan_aichi_JP_24h.png Japan_hokkaido_JP_24h.png

(* But since it is in Japanese, I don't think you would know what it is when you see it.)

kkos avatar Dec 09 '22 14:12 kkos

I randomly found this issue when looking up regex engines from Wikipedia. Guess advertising on front page works :)

I was able to reproduce in Japan. More importantly my friend at Google could too. It looks like a search bug so hope it gets fixed.

My hypothesis is the romaji gets converted to kanji 鬼車, but maybe split into two tokens 鬼 and 車. Especially the latter will retrieve a lot of unrelated pages, just need something like 車でお越しの方 somewhere.

The issue doesn't reproduce outside Japanese because the conversion from romaji to kanji is probably disabled elsewhere.

I suspect the attacker is a software bug and hope it gets squashed! I think we all know how hard CJK can be to get right ;)

anuraaga avatar Dec 12 '22 09:12 anuraaga