estatapi icon indicating copy to clipboard operation
estatapi copied to clipboard

Translation for non-Japanese

Open Hergie opened this issue 8 years ago • 5 comments

Hi, I've run your example data query after registration

d1 <- estat_getStatsData(
        appId = appId, statsDataId = "0003103532", cdCat01 = c("010800130","010800140"),
        limit = 100
      )

By using head(d1), the data looks like:

   @tab    @cat01 @cat02 @area      @time    @unit     $ value         tab_info                                            cat01_info
  (chr)     (chr)  (chr) (chr)      (chr)    (chr) (chr) (dbl)            (chr)                                                (chr)
1    01 010800130     03 00000 2016000505 <U+5186>   346   346 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
2    01 010800130     03 00000 2016000404 <U+5186>   403   403 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
3    01 010800130     03 00000 2016000303 <U+5186>   580   580 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
4    01 010800130     03 00000 2016000202 <U+5186>  1376  1376 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
5    01 010800130     03 00000 2016000101 <U+5186>   665   665 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>
6    01 010800130     03 00000 2015001212 <U+5186>   585   585 <U+91D1><U+984D> 352 <U+30C1><U+30E7><U+30B3><U+30EC><U+30FC><U+30C8>

Is there a way to make sense of this data as a non-Japanese speaker? Ie. is the package actually usable if you do not know the language and do not use a Japanese version of R-studio?

If it is still possible, how should I proceed in understanding the contents of the data?

Thanks again!

Hergie avatar Jul 12 '16 17:07 Hergie

Thanks for using estatapi, but sorry, since this package is just a wrapper of the API, I don't have any plan to add some functionality to translate the data :(

I will try to find some good workaround, but please don't expect too much...

yutannihilation avatar Jul 12 '16 21:07 yutannihilation

I understand very well. However, instead of translation, I would have a simpler question. Regarding the above table obtained by head(d1), what should I do to display the character variables in Japanese characters instead of unicode (e.g. variable tab_info)? I could then use a Japanese-English translator to understand the descriptions.

More broadly, could you link to a (Japanese) website that describes to what databases each statsDataId refers (and below that, to which dataseries each series code refers)? There must be such a webpage, but the only thing I found was: http://www.e-stat.go.jp/api/api-data/. And even there the database codes do not correspond to the statsDataId used by estatapi.

Thank you again for all your help!

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] estatapi_0.2 dplyr_0.4.3  tidyr_0.4.1 

loaded via a namespace (and not attached):
 [1] httr_1.1.0      lazyeval_0.1.10 magrittr_1.5    R6_2.1.2        assertthat_0.1  parallel_3.3.0  DBI_0.4        
 [8] tools_3.3.0     curl_0.9.7      Rcpp_0.12.4     jsonlite_0.9.19 purrr_0.2.1 

Hergie avatar Jul 14 '16 20:07 Hergie

Regarding the first issue (reading Japanese characters correctly), the answer can be found here: http://stackoverflow.com/questions/11228307/writing-data-isnt-preserving-encoding

Still interested in the second part of the question related to metadata/catalogue concerning statsDataId.

Hergie avatar Jul 15 '16 14:07 Hergie

I have good news and bad news!

Good news is that, I forgot to describe about the param lang (I'm very sorry, but let me give some excuse about this later...). You can specify lang="E" to get English data :)

("E" stands for English, "J" for Japanese)

d <- estat_getStatsData(
    appId = appId, statsDataId = "0003036792", lang = "E"
)
#> There are more records; please rerun with startPosition=100001

head(d)
#> # A tibble: 6 x 10
#>    @tab @cat01 @area      @time     $ value tab_info cat01_info area_info time_info
#>   <chr>  <chr> <chr>      <chr> <chr> <dbl>    <chr>      <chr>     <chr>     <chr>
#> 1     1   0001 13A01 2016000606 101.7 101.7    Index  All items   Ku-area Jun. 2016
#> 2     1   0001 13A01 2016000505   102 102.0    Index  All items   Ku-area  May 2016
#> 3     1   0001 13A01 2016000404   102 102.0    Index  All items   Ku-area Apr. 2016
#> 4     1   0001 13A01 2016000303   102 102.0    Index  All items   Ku-area Mar. 2016
#> 5     1   0001 13A01 2016000202 101.7 101.7    Index  All items   Ku-area Feb. 2016
#> 6     1   0001 13A01 2016000101 101.3 101.3    Index  All items   Ku-area Jan. 2016

But, the bad news is that, not all data are translated. Maybe I've saw this kind of error message first and jumped to the wrong conclusion that there are NO English data... Sorry that I am terribly fool.

d <- estat_getStatsData(
    appId = appId, statsDataId = "0003036792", cdCat01 = c("010800130","010800140"),
    limit = 100, lang = "E"
)
#> Error in estat_api("rest/2.0/app/json/getStatsData", appId = appId, statsDataId = statsDataId,  : 
#>   The data that statistics data ID(statsDataId) is equal to [0003103532] does not exist. Please confirm ID. 

Regarding the second question, I will try to answer in the next comment, but this is complicated...

yutannihilation avatar Jul 16 '16 13:07 yutannihilation

Each survey has multiple datasets (e.g. The result of census contains many datasets like population data and households data) and there are different kinds of codes for these two.

The codes listed in http://www.e-stat.go.jp/api/api-data/ are the codes for survey, not ones for dataset. On the other hand, statsDataId is about dataset, not about survey.

a (Japanese) website that describes to what databases each statsDataId refers (and below that, to which dataseries each series code refers)

Sadly, I could not find any websites like this. I think all we can do now is just to search by estat_getStatsList(). I feel I should add some function to build the corresponding table locally.

yutannihilation avatar Jul 16 '16 13:07 yutannihilation