microdadosBrasil icon indicating copy to clipboard operation
microdadosBrasil copied to clipboard

Issue reading Censo Escolar data

Open alinemsm opened this issue 7 years ago • 2 comments

Hello all,

Thanks very much for the package. It is a great contribution to the research community.

I am trying to import Censo Escolar for escolas for the whole period available (1995-2015). The download function works well but the read function don’t seem to find the final txt or csv files. For example, in 1996 and 1998 I get the following error:

download_sourceData('CensoEscolar', 1998, unzip=T, replace=T) d <- read_CensoEscolar('escola', 1998, harmonize_varnames=F)

Error in read_data("CensoEscolar", ft, i, var_translator = var_translator, : Data not found. Check if you have unziped the data.

In this case, I think it has to do with the backslash generated in the path when the file is unzipped. See the paths below. I do not know whether this is specific to the system I am using (OSX). “./micro_censo_escolar1998/Dados\DADOS_CENSOESC/DADOS_CENSOESC.TXT” “./micro_censo_escolar1996/Dados\DADOS_CENSO96/DADOS_CENSO96.txt”

However, in 2005, for example, the path seem to be fine (“./microdados_censo_escolar_2005/DADOS/CENSOESC_2005/CENSOESC_2005.TXT”) but the function does not work either. The error is different though.

download_sourceData('CensoEscolar', 2005, unzip=T, replace=T) d <- read_CensoEscolar('escola', 2005, harmonize_varnames=F)

integer(0) |=======================================================| 100% 4095 MB Warning: 2 parsing failures. row # A tibble: 2 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 98241 NEF12A9 12 chars 7 'microdados_censo_escolar_… file 2 98241 NA 3805 columns 1219 columns 'microdados_censo_escolar_… Time difference of 2.465648 mins 2.8 Gb

Could that be because the file extension is in uppercase? Your metadata_harmonization file registers it in lowercase.

To support my hypothesis, I am able to read the data for 1995 which has path “./micro_censo_escolar1995/DADOS/DADOS_CENSOESC/DADOS_CENSOESC.txt”.

From 2008 to 2015 files are compressed in .rar. Therefore, cannot be decompressed natively from R. However, 1995:1996, 1998:2006 are in .zip and should work fine. But my decompressed files for 1996, 1998:2006 all appear in uppercase extensions (.TXT).

I managed to download the files for all years using the download function and unzip them recursively. Feeding the read function with the exact path to the TXT or CSV files make it work, but only with the option harmonize_varnames=F. I would like to know why harmonization does not work in this case as well.

alinemsm avatar Feb 01 '18 12:02 alinemsm

Hello @alinemsm

Sorry for the late answer. I've tested both functions in Windows and it all worked properly.

About the first case (1998), did you test the function with a root_path or file argument?

I'm checking the second case deeply to see what is the problem.

monteirogustavo avatar May 02 '18 16:05 monteirogustavo

Hi @monteirogustavo / @lucasmation --

I'm encountering another error when importing the 1998 censo escolar data -- the same errror, both with the root_path AND with the file arguments.

Here is my code:

[With root_path argument]

d1998 <- read_CensoEscolar('escola',1998,root_path =myrootpath,harmonize_varnames=F)

[With file argument]

d1998 <- read_CensoEscolar('escola',1998,
file="~/Dropbox/CensoEscolarBrasil/input/micro_censo_escolar1998/Dados/DADOS_CENSOESC.TXT",harmonize_varnames=F)

The error I get is "guess_max must be a positive integer" (I'm omitting my username in the error message). See below:

[1]  41  42 514 656 657
|======================================================================================================================| 100% 1703 MB
Warning: 1205 parsing failures.
  row     col               expected actual                                                                                                  file
 8293 TERRENO no trailing characters      1 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
 9033 TERRENO no trailing characters      1 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
14982 TERRENO no trailing characters      1 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
15023 TERRENO no trailing characters      2 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
15328 TERRENO no trailing characters      1 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
..... ....... ...................... ...... ......................................................... [... truncated]
Warning: 13 parsing failures.
   row      col               expected actual                                                                                                  file
 96481 AR_CONST no trailing characters      7 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
 96483 AR_CONST no trailing characters      9 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
126664 AR_CONST no trailing characters     33 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
147716 AR_CONST no trailing characters      4 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
194856 AR_CONST no trailing characters      1 '/Users/.../Dropbox/CensoEscolarBrasil/input//micro_censo_escolar1998/DADOS/DADOS_CENSOESC.TXT'
...... ........ ...................... ...... ............................................. [... truncated]
Error: `guess_max` must be a positive integer

fkup avatar Jun 03 '19 17:06 fkup