bbs
bbs copied to clipboard
Kazakhstan "banned sites" query form and archive
tl;dr: There is a Kazakhstan government web form that lets you report websites that should be blocked, and also allows you to query to check if a site is already on the list. It's possible to make the query tool return many results at once, and results lists as of 2025-03-17 are archived at:
https://archive.org/details/www_gov_kz_banned_sites_20250317 2025-03-17/dedup.jsonl (38 MB)
The dataset suggests a number of research questions.
Via issue tpo/anti-censorship/censorship-analysis#40059 at the Tor issue tracker, I learned of a web page, associated with the "Public Administration Committee in the field of Communication, Informatization and Mass Media under Ministry of Information and Public Development of the Republic of Kazakhstan", that lets you report sites to be added to a blocklist, or check to see if a site is already on the list. The page is available in three languages:
- Complain about the content (archive)
- Интернет контентке шағым айту (archive)
- Пожаловаться на интернет контент (archive)
I had not been aware of this web form previously. The Wayback Machine has records of it going back to 2024-02-15.
When you enter a query into the "Check site URL" / "Сайтты тексеру" / "Проверить сайт" search form, the browser makes an HTTP request to receive a set of results in the form of a JSON array. For example, a search query of "torproject.org" results in a request to the URL https://www.gov.kz/banned_sites?url=torproject.org, the response to which is a JSON array with 1 element:
[
{
"id": "3771",
"name": "torproject.org",
"urladdress": "https://www.torproject.org/ru/download",
"categoryname": "Нарушение норм Закона РК «О связи» (анонимайзеры, прокси-серверы типа TOR, VPN-серверы и др.)",
"courtname": "",
"courtdate": "27.11.2024",
"documenttypename": "Предписание УО",
"regdate": "",
"regnumber": "",
"courtnumber": " 26-04-26/5175",
"blockdate": null,
"ipaddress": null
}
]
I thought, that's interesting, you could take a big list of domain names and run them through the query form one by one, in order to find whether a record exists for each of them. But it's actually even easier than that. You can query the form with a single letter, and the results will include (apparently) every record that contains that letter. For example, https://www.gov.kz/banned_sites?url=j returns a JSON array with 13,444 elements that contain the letter 'j':
[
{
"id": "10",
"name": "etopers.click",
"urladdress": "https://etopers.click/nastojashhij-seks",
"categoryname": "Распространение порнографии",
"courtname": "",
"courtdate": "30.01.2025",
"documenttypename": "Предписание УО",
"regdate": "",
"regnumber": "",
"courtnumber": "26-04-26/420",
"blockdate": null,
"ipaddress": null
},
{
"id": "58",
"name": "crbjanibek.kz",
"urladdress": "https://crbjanibek.kz",
"categoryname": "Деятельность интернет-казино",
"courtname": "",
"courtdate": "29.01.2025",
"documenttypename": "Предписание УО",
"regdate": "",
"regnumber": "",
"courtnumber": " 26-04-26/364",
"blockdate": null,
"ipaddress": null
},
...
]
I did a queries for the single letters a–z, 0–9, _, and - and archived the results. Some letters result in an HTTP status code 500 error, but others produce JSON files of between 5 MB (q) and 23 MB (l). The archive is here:
https://archive.org/details/www_gov_kz_banned_sites_20250317
There's a lot of overlap across files, because a given URL will be matched in a query for lots of different letters. dedup.jsonl combines and deduplicates all the single-letter files to produce 102,644 records in total. The "id" field appears to be sequential, which means you can use it to look for gaps.
This looks like a data set that would support one or more research projects. If you've wanted to write a FOCI paper, this is a good opportunity. Some ideas for research questions:
- What is the range and distribution of dates on entries? What are the earliest and latest dates? Are there concentrations of records around certain times?
- What website categories are represented, both according to the list's internal categorization and any other categorization tool?
- Every record seems to have an associated course case number. How many records are affected by each court case?
- What are the changes to the list over time (if you re-archive it periodically)? How many records are added per month? Are records ever removed?
- If you submit a site to be blocked, what happens, and how long does it take to be acted on?
The checkboxes of reasons for blocking are also interesting. They may give an idea of what types of information the censor cares most about blocking.
- Suicide propaganda
- Propaganda of drugs, psychotropic substances, their analogues and precursors
- Propaganda or agitation of ferocity or violence cults as well as social, racial, national, religious, class and patrimonial superiority
- Demonstration of pornography or erotic movies and videos
- Election campaigning
- Propaganda and agitation for dismantlement of statehood
- Integrity violation of the Republic of Kazakhstan
- Extremism and terrorism propaganda
- Publication of materials and dissemination of information aimed at interethnic and sectarian hostility
- Online Casino
- Dishonest, unreliable advertising
- Others
In a quick search I found some other sites related to a Kazakhstan blocklist.
Kaz Blocking Tracker (kazbt.com) says it does an active HTTP HEAD request to check the current status of blocking. When I tried the form just now, it said "Service Temporarily Unavailable."
KAZBT.COM tracks whether 1M+ 100,000 most-visited websites in the world are accessible in Kazakhstan.
All query results are cached for 3 hours. The blocked websites counter is updated every day. The status of websites is checked every week. You can build a graph and see the historical data for all websites.
A website can be blocked in Kazakhstan only by a court order without a court order (changes to the law in 2015). However, often, there is a real lack of transparency.
The site has a news post from 2017-08-23 (archive) with good information about blocking due to court cases and mismatches between official lists and what is actually blocked.
Анализ списка заблокированных сайтов
Целью этой статьи является оценка ситуации с запретом веб-сайтов в Казахстане.
"…Общество вправе получать такую информацию в открытом доступе. Поэтому в перспективе мы планируем создать базу заблокированных ссылок с указанием причин блокировки ресурса"
Даурен Абаев Министр информации и коммуникаций 12 августа 2016
Предыстория
Сайты блокировались по решению суда с 2011 года, но в Казнете нет и не было официального актуального списка заблокированных сайтов. Поэтому с Марта 2016 года, сервис kazbt.com отслеживает доступность популярных веб-сайтов на территории РК и ведет свой реестр сайтов.
Во второй половине 2016 года, сайт Министерства информации и коммуникаций РК (mic.gov.kz) ~~криво~~ в пилотном режиме запустил реестр запрещенных интернет-ресурсов. Благодаря этому мы можем сравнить данные доступные на сайте министерства с данными из нашего сервиса. К сожалению, сайт министерства имеет неполный список заблокированных сайтов.
Analysis of the list of blocked sites
The purpose of this article is to assess the situation with the ban on websites in Kazakhstan.
"…Society has the right to receive such information in the public domain. Therefore, in the future, we plan to create a base of blocked links indicating the reasons for the blocking of the resource"
Dauren Abaev Minister of Information and Communications August 12, 2016
Background
The sites have been blocked by the court decision since 2011, but in the execution there is no and there was no official current list of blocked sites. Therefore, since March 2016, the KazBT.com service has been tracking the availability of popular websites in the Republic of Kazakhstan and maintains its own register of sites.
In the second half of 2016, the website of the Ministry of Information and Communications of the Republic of Kazakhstan (mic.gov.kz) ~~crookedly~~ in pilot mode launched a register of prohibited Internet resources. Thanks to this, we can compare the data available on the website of the ministry with data from our service. Unfortunately, the website of the ministry has an incomplete list of blocked sites.
Internet Freedom Kazakhstan also has a check URL form: "Check your website in the State Register of Prohibited Internet Resources". I searched for torproject.org just now, and it said "No result".
Similar to Russia’s ‘Unified registry of prohibited sites’, Kazakhs borrowed their idea? The querying response looks similar, too. There’s also an unofficial mirror of the registry: https://reestr.rublacklist.net/ by Roskomsvoboda.
Need to check if other censoring CIS countries like Azerbaijan, Belarus and Turkmenistan have such service.
I scraped the query form a few times in the past month. The results each time were identical. I take that as a sign that the list of banned sites has not changed in that time.
https://archive.org/details/www_gov_kz_banned_sites_20250317 2025-03-17 2025-03-27 2025-04-07 2025-04-18