estatapi
estatapi copied to clipboard
Translation for non-Japanese
Hi, I've run your example data query after registration
d1 <- estat_getStatsData(
appId = appId, statsDataId = "0003103532", cdCat01 = c("010800130","010800140"),
limit = 100
)
By using head(d1)
, the data looks like:
@tab @cat01 @cat02 @area @time @unit $ value tab_info cat01_info
(chr) (chr) (chr) (chr) (chr) (chr) (chr) (dbl) (chr) (chr)
1 01 010800130 03 00000 2016000505 <U+5186> 346 346 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
2 01 010800130 03 00000 2016000404 <U+5186> 403 403 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
3 01 010800130 03 00000 2016000303 <U+5186> 580 580 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
4 01 010800130 03 00000 2016000202 <U+5186> 1376 1376 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
5 01 010800130 03 00000 2016000101 <U+5186> 665 665 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
6 01 010800130 03 00000 2015001212 <U+5186> 585 585 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
Is there a way to make sense of this data as a non-Japanese speaker? Ie. is the package actually usable if you do not know the language and do not use a Japanese version of R-studio?
If it is still possible, how should I proceed in understanding the contents of the data?
Thanks again!
Thanks for using estatapi, but sorry, since this package is just a wrapper of the API, I don't have any plan to add some functionality to translate the data :(
I will try to find some good workaround, but please don't expect too much...
I understand very well. However, instead of translation, I would have a simpler question. Regarding the above table obtained by head(d1)
, what should I do to display the character variables in Japanese characters instead of unicode (e.g. variable tab_info
)? I could then use a Japanese-English translator to understand the descriptions.
More broadly, could you link to a (Japanese) website that describes to what databases each statsDataId
refers (and below that, to which dataseries each series code refers)? There must be such a webpage, but the only thing I found was: http://www.e-stat.go.jp/api/api-data/. And even there the database codes do not correspond to the statsDataId
used by estatapi
.
Thank you again for all your help!
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] estatapi_0.2 dplyr_0.4.3 tidyr_0.4.1
loaded via a namespace (and not attached):
[1] httr_1.1.0 lazyeval_0.1.10 magrittr_1.5 R6_2.1.2 assertthat_0.1 parallel_3.3.0 DBI_0.4
[8] tools_3.3.0 curl_0.9.7 Rcpp_0.12.4 jsonlite_0.9.19 purrr_0.2.1
Regarding the first issue (reading Japanese characters correctly), the answer can be found here: http://stackoverflow.com/questions/11228307/writing-data-isnt-preserving-encoding
Still interested in the second part of the question related to metadata/catalogue concerning statsDataId
.
I have good news and bad news!
Good news is that, I forgot to describe about the param lang
(I'm very sorry, but let me give some excuse about this later...). You can specify lang="E"
to get English data :)
("E"
stands for English, "J"
for Japanese)
d <- estat_getStatsData(
appId = appId, statsDataId = "0003036792", lang = "E"
)
#> There are more records; please rerun with startPosition=100001
head(d)
#> # A tibble: 6 x 10
#> @tab @cat01 @area @time $ value tab_info cat01_info area_info time_info
#> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 0001 13A01 2016000606 101.7 101.7 Index All items Ku-area Jun. 2016
#> 2 1 0001 13A01 2016000505 102 102.0 Index All items Ku-area May 2016
#> 3 1 0001 13A01 2016000404 102 102.0 Index All items Ku-area Apr. 2016
#> 4 1 0001 13A01 2016000303 102 102.0 Index All items Ku-area Mar. 2016
#> 5 1 0001 13A01 2016000202 101.7 101.7 Index All items Ku-area Feb. 2016
#> 6 1 0001 13A01 2016000101 101.3 101.3 Index All items Ku-area Jan. 2016
But, the bad news is that, not all data are translated. Maybe I've saw this kind of error message first and jumped to the wrong conclusion that there are NO English data... Sorry that I am terribly fool.
d <- estat_getStatsData(
appId = appId, statsDataId = "0003036792", cdCat01 = c("010800130","010800140"),
limit = 100, lang = "E"
)
#> Error in estat_api("rest/2.0/app/json/getStatsData", appId = appId, statsDataId = statsDataId, :
#> The data that statistics data ID(statsDataId) is equal to [0003103532] does not exist. Please confirm ID.
Regarding the second question, I will try to answer in the next comment, but this is complicated...
Each survey has multiple datasets (e.g. The result of census contains many datasets like population data and households data) and there are different kinds of codes for these two.
The codes listed in http://www.e-stat.go.jp/api/api-data/ are the codes for survey, not ones for dataset. On the other hand, statsDataId
is about dataset, not about survey.
a (Japanese) website that describes to what databases each statsDataId refers (and below that, to which dataseries each series code refers)
Sadly, I could not find any websites like this. I think all we can do now is just to search by estat_getStatsList()
. I feel I should add some function to build the corresponding table locally.