microdadosBrasil
microdadosBrasil copied to clipboard
bug in POF: file types and data_path not working
After the update to include POF 1987/88 and 1995/95, the package is not working for POF 2002. I think you need to review the POF_files_metada_harmonization.csv
and the read_POF seems to have a bug when creating the data_path. See reprex below.
library(microdadosBrasil)
get_available_datasets()
#> [1] "CAGED" "CensoEducacaoSuperior"
#> [3] "CensoEscolar" "CENSO"
#> [5] "ENEM" "PME"
#> [7] "PnadContinua" "PNAD"
#> [9] "PNS" "POF"
#> [11] "RAIS"
get_available_periods("POF")
#> [1] 2008 2002 1995 1987
# Show that there are more filetypes than it should
file_types <- get_available_filetypes("POF",2002)
# POF 2002 should have only 14 file_types
file_types
#> [1] "aluguel_estimado" "caderneta_despesa"
#> [3] "condicoes_de_vida" "consumo"
#> [5] "despesa_12meses" "despesa_90dias"
#> [7] "despesa_individual" "despesa_veiculo"
#> [9] "domicilio" "inventario"
#> [11] "morador_imput" "morador"
#> [13] "outras_despesas" "outros_rendimentos"
#> [15] "rendimentos" "servico_domestico"
#> [17] "despesa_esp" "despesa_6meses"
#> [19] "despesas_bens_duraveis_credito"
# Show that there is no dictionary for non-existent ft
get_import_dictionary("POF",2002,ft="aluguel_estimado")
#> Error in get_import_dictionary("POF", 2002, ft = "aluguel_estimado"): There is no available dictionary for this year. You can help to expand the package creating the dictionary, see more information at https://github.com/lucasmation/microdadosBrasil
aluguel<-read_POF(2002,ft="aluguel_estimado",root_path = "~/Desktop/teste_microBrasil/")
#> You have specified the 'root_path' argument, in this case we will assume that data is in the directory specified and it is exactly as it have been downloaded from the source.
#> Error in read_data(dataset = "POF", ft = ft, i = i, root_path = root_path, : Data not found. Check if you have unziped the data
# Show that even with existent ft there is a bug in the data_path
get_import_dictionary("POF",2002,ft="domicilio")
#> int_pos var_name x label length decimal_places fin_pos col_type
#> 1 3 uf 2. NA 2 0 4 i
#> 2 5 seq 3. NA 3 0 7 i
#> 3 8 dv 1. NA 1 0 8 i
#> 4 3 controle 6. NA 6 0 8 i
#> 5 9 domcl 2. NA 2 0 10 i
#> 6 11 estrato 2. NA 2 0 12 i
#> 7 13 fator_set 11.5 NA 11 5 23 d
#> 8 24 fator 11.5 NA 11 5 34 d
#> 9 35 pt 2. NA 2 0 36 i
#> 10 37 pt_real 2. NA 2 0 38 i
#> 11 39 n_morador 2. NA 2 0 40 i
#> 12 41 tipo 1. NA 1 0 41 i
#> 13 42 n_comodos 2. NA 2 0 43 i
#> 14 44 n_dorm 2. NA 2 0 45 i
#> 15 46 n_banh 2. NA 2 0 47 i
#> 16 48 a_agua 1. NA 1 0 48 i
#> 17 49 esgoto 1. NA 1 0 49 i
#> 18 50 cond_ocup 1. NA 1 0 50 i
#> 19 51 e_eletrica 1. NA 1 0 51 i
#> 20 52 piso 1. NA 1 0 52 i
#> 21 53 pavrua 1. NA 1 0 53 i
#> 22 54 temp_mor 1. NA 1 0 54 i
#> 23 55 quant_uc 1. NA 1 0 55 i
#> 24 56 contrato 1. NA 1 0 56 i
#> 25 57 renda 12.4; NA 12 4 68 d
#> CHAR
#> 1 FALSE
#> 2 FALSE
#> 3 FALSE
#> 4 FALSE
#> 5 FALSE
#> 6 FALSE
#> 7 FALSE
#> 8 FALSE
#> 9 FALSE
#> 10 FALSE
#> 11 FALSE
#> 12 FALSE
#> 13 FALSE
#> 14 FALSE
#> 15 FALSE
#> 16 FALSE
#> 17 FALSE
#> 18 FALSE
#> 19 FALSE
#> 20 FALSE
#> 21 FALSE
#> 22 FALSE
#> 23 FALSE
#> 24 FALSE
#> 25 FALSE
domicilio<-read_POF(2002,ft="domicilio",root_path = "~/Desktop/teste_microBrasil")
#> You have specified the 'root_path' argument, in this case we will assume that data is in the directory specified and it is exactly as it have been downloaded from the source.
#> [1] 1 2
#> Time difference of 0.3933437 secs
#> 0 Gb
#> Error in paste0(data_path, names(out), ".txt"): object 'data_path' not found
Created on 2018-07-25 by the reprex package (v0.2.0).
I still think the POF_files_metadata_harmonization.csv
needs review because of the get_available_filetypes
.
The problem with the read_POF, however, is simple. Line 206 of the import_wrapper_functions.R
is invisible(file.remove(paste0(data_path,names(out),".txt")))
. The object out
was defined inside a previous if
statement that is valid only for years equal to 1987, 1995 and 1997 (I don't get why 1997 was included). Then, this line breaks the code when read_POF
is used for years 2002 and 2008. I'm not sending as a PR because I don't understand why this line is relevant. I only commented out the line to test and it does solve the problem.
It seems there is a similar issue with the POF 2008-2009 data. I get this error when I try to read the data.
# Set working directory
setwd("R:/Dropbox/bases_de_dados/POF/POF_2008-2009")
# download POF data
download_sourceData("POF", 2008, unzip = T)
# read POF data layout [This part works fine]
pof_dic_moradores <- get_import_dictionary(dataset = "POF",i = 2008, ft = "morador")
# read data
df_moradores <- read_POF(ft = "morador", i = 2008)
> Error in read_data(dataset = "POF", ft = ft, i = i, root_path = root_path, :
> Data not found. Check if you have unziped the data
@rafapereirabr, have you checked if the files were unzipped as stated in the error message? In your case, POF2008 files are zipped as .7z and you need to unzip them manually. As I stated in my previous comment, if you remove line 206 or move it to inside the if
statement of the import_wrapper_functions.R
and build the package you shold be able to use the read_POF function for 2002 and 2008.
Thank you for the heads up !
I also had an small issue using the read_POF function for the POF 2008-2009 data. I believe it came from using the download_sourceData function, which unzipped the microdata files as "Dadosyyyymmdd" instead of only "Dados" as required (I think) by the read_POF function to properly process the data. Manually renaming the folder fixed the issue.