geobr icon indicating copy to clipboard operation
geobr copied to clipboard

geobr::read_municipality() in 2000 has duplicated municipality 3509908

Open MatthieuStigler opened this issue 3 years ago • 2 comments

This problem seems more to be on the source data side than the geobr package, but there is a very strange duplicated 3509908 municipality in 2000. It appears twice:

  • as Cananeia (no accent) and as very small polygon of ~6 hectares
  • as Cananéia (with accent) and as a more "normal-looking' polygon of 124,420 hectares

Ideally this could be repaired at the source, but if not, it might be good to add a warning in that case? It seems reasonable that the user can expect a unique municipality-year dataset?

Thanks!

library(dplyr, warn.conflicts=FALSE)
dat_2000 <- geobr::read_municipality(year = 2000, simplified = TRUE, code_muni = 3509908)
#> Loading required namespace: sf
#> Using year 2000
#> dat_2000
#> Simple feature collection with 2 features and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -48.23922 ymin: -25.31183 xmax: -47.73522 ymax: -24.75571
#> Geodetic CRS:  SIRGAS 2000
#>     code_muni name_muni code_state abbrev_state                           geom
#> 113   3509908  Cananeia         35           SP MULTIPOLYGON (((-48.0035 -2...
#> 114   3509908  Cananéia         35           SP MULTIPOLYGON (((-48.23922 -...
sf::st_area(dat_2000)
#> Units: [m^2]
#> [1] 6.231309e+04 1.244208e+09

plot(sf::st_geometry(dat_2000[1,]))

plot(sf::st_geometry(dat_2000[2,]))

Created on 2022-02-08 by the reprex package (v2.0.1)

MatthieuStigler avatar Feb 08 '22 14:02 MatthieuStigler

Hi @MatthieuStigler . Thank you for the heads up. The original data provided by IBGE somes with various issues like this one. One of the benefits of geobr is precisely to get rid of these problems and make a clean version of the data easily available. So thanks for pointing us to this issue. I'll address it in the next round of updates / corrections

rafapereirabr avatar Feb 08 '22 16:02 rafapereirabr

great, glad to hear you can address this at the geobr level at least! Thanks for all the good work :-)

for what I could see, doing add_count(muni_code, year) %>% filter(n>1) on a dataset row-binding all the data-output of mutate(data = map(year, ~geobr::read_municipality(year = ., simplified = TRUE))), where years were from 2000 to 2020 (except 2001), this was the only duplicate I found.

MatthieuStigler avatar Feb 08 '22 17:02 MatthieuStigler

Hi @MatthieuStigler . This issue has now been fixed. I'm sorry it took a long time , but I believe this is now fixed for good. Please let me know if the problem persists or if you've found this or other similar issues elsewhere in the package

rafapereirabr avatar Apr 08 '24 20:04 rafapereirabr

Oops; Reopening this issue because now there is an issue with column class incompatibility. I'll fix this tomorrow

df <- geobr::read_municipality(year = 2000)

Error in data.table::rbindlist(files, fill = TRUE) : Class attribute on column 8 of item 5 does not match with column 8 of item 1.

rafapereirabr avatar Apr 08 '24 20:04 rafapereirabr

fixed

rafapereirabr avatar Apr 18 '24 13:04 rafapereirabr

muito obrigado for your great work Rafael!

MatthieuStigler avatar Apr 18 '24 14:04 MatthieuStigler