geobr::read_municipality() in 2000 has duplicated municipality 3509908
This problem seems more to be on the source data side than the geobr package, but there is a very strange duplicated 3509908 municipality in 2000. It appears twice:
- as
Cananeia(no accent) and as very small polygon of ~6 hectares - as
Cananéia(with accent) and as a more "normal-looking' polygon of 124,420 hectares
Ideally this could be repaired at the source, but if not, it might be good to add a warning in that case? It seems reasonable that the user can expect a unique municipality-year dataset?
Thanks!
library(dplyr, warn.conflicts=FALSE)
dat_2000 <- geobr::read_municipality(year = 2000, simplified = TRUE, code_muni = 3509908)
#> Loading required namespace: sf
#> Using year 2000
#> dat_2000
#> Simple feature collection with 2 features and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -48.23922 ymin: -25.31183 xmax: -47.73522 ymax: -24.75571
#> Geodetic CRS: SIRGAS 2000
#> code_muni name_muni code_state abbrev_state geom
#> 113 3509908 Cananeia 35 SP MULTIPOLYGON (((-48.0035 -2...
#> 114 3509908 Cananéia 35 SP MULTIPOLYGON (((-48.23922 -...
sf::st_area(dat_2000)
#> Units: [m^2]
#> [1] 6.231309e+04 1.244208e+09
plot(sf::st_geometry(dat_2000[1,]))

plot(sf::st_geometry(dat_2000[2,]))

Created on 2022-02-08 by the reprex package (v2.0.1)
Hi @MatthieuStigler . Thank you for the heads up. The original data provided by IBGE somes with various issues like this one. One of the benefits of geobr is precisely to get rid of these problems and make a clean version of the data easily available. So thanks for pointing us to this issue. I'll address it in the next round of updates / corrections
great, glad to hear you can address this at the geobr level at least! Thanks for all the good work :-)
for what I could see, doing add_count(muni_code, year) %>% filter(n>1) on a dataset row-binding all the data-output of mutate(data = map(year, ~geobr::read_municipality(year = ., simplified = TRUE))), where years were from 2000 to 2020 (except 2001), this was the only duplicate I found.
Hi @MatthieuStigler . This issue has now been fixed. I'm sorry it took a long time , but I believe this is now fixed for good. Please let me know if the problem persists or if you've found this or other similar issues elsewhere in the package
Oops; Reopening this issue because now there is an issue with column class incompatibility. I'll fix this tomorrow
df <- geobr::read_municipality(year = 2000)
Error in data.table::rbindlist(files, fill = TRUE) : Class attribute on column 8 of item 5 does not match with column 8 of item 1.
fixed
muito obrigado for your great work Rafael!