eurostat Problem with week time code

It seems eurostat (more specifically, eurotime2date) can't handle weekly data:

temp <- eurostat::get_eurostat("demo_r_mweek3")
#> Warning in eurotime2date(x, last = FALSE): Unknown time code, W. No date conversion was made.
#> 
#>             Please fill bug report at https://github.com/rOpenGov/eurostat/issues.
#> Table demo_r_mweek3 cached at C:\Users\FERENC~1\AppData\Local\Temp\RtmpsNaBz8/eurostat/demo_r_mweek3_date_code_FF.rds

Dec 28 '20 11:12 tamas-ferenci

No, it doesn't. I think weekly data is relatively new addition in Eurostat.

I thought that id would be easyly fixed, but

it seems as.Date does not support ISO weeks (%V), as least not on Windows (https://stackoverflow.com/questions/45549449/transform-year-week-to-date-object/45587644#45587644),
nor does lubridate (https://github.com/tidyverse/lubridate/issues/506)

However, there seems to be a ISOweek package: https://cran.r-project.org/web/packages/ISOweek/, which I guess gives right dates. Or we could use UK week defination (there is some difference in starting week).

Then there seems to be also week W99. How, that is supposed to be treated?

Dec 28 '20 13:12 jhuovari

Yes, I personally decided to use ISOweek package too in a similar situation. You definitely need the 8601 standard; the metadata says - for my particular example - that "the definition of ‘week’ is given by ISO8601 week number" (https://ec.europa.eu/eurostat/cache/metadata/en/demomwk_esms.htm).

99 means that the week is not known (to cite the same source: "W99 means ‘unknown week’.").

Dec 28 '20 14:12 tamas-ferenci

As it is converted to a Date, on what date a W99 should be converted? The last day of the last week?

Dec 28 '20 14:12 jhuovari

Very good question. Definitely not the last week, as it'd imply that all people with unknown death date died on the last week, i.e. they'd be pooled together with those who indeed died on the last week. I don't know whether it breaks any consistency within eurostat, but perhaps the most clear solution would be to set their date to NA...

Dec 28 '20 14:12 tamas-ferenci

But then we would lose year information.

I thought that last week would have information on two dates. Dated infromation on the first day, as normal, and unknown on the last day.

Dec 28 '20 14:12 jhuovari

Ah, I forget that, you're completely correct.

I am no expert in designing such things, but what you outlined seems to be a possible solution. Although the user has to be very clearly informed in this case what do those dates exactly mean (and also generally, that while there is a concrete date, the data pertains to a week).

Dec 28 '20 14:12 tamas-ferenci

FWIW, {ISOweek} is now the correct solution, I think - I just ended up using it on the same data (national, not Eurostat, but produced to the same standard). Perhaps https://github.com/tidyverse/lubridate/issues/506#issuecomment-770310175 may also be helpful.

And thanks for {eurostat}, very helpful!

Jan 31 '21 01:01 petrbouchal

My solution is to filter out the data from W99, which definitely is not a clean solution, but given it only affects Hungary/Latvia and Sweden.... its a workaround.

W99 values by geo and year:

geo/year	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
HU	5	4	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
LV	90	72	63	41	19	33	29	33	33	19	20	13	18	NA	NA	NA	NA	NA	NA	NA	NA	NA
SE	1493	1534	1520	1437	1137	822	749	650	515	538	402	428	439	464	486	960	1963	2230	2513	2616	2663	713

So right know I have this code working fine:

df <- demo_r_mwk_ts%>%
  # extract year
  # extract weeknr
  mutate(year=substr(time,1,4),
         week=substr(time,6,7))%>%
  #filter out week 99
  filter(week!=99)%>%
  # create date using "ISOweek" package
  mutate(date=ISOweek:::ISOweek2date(paste0(year,"-W",week,"-1")))

The best way would be if Eurostat would divide the W99 values and assign them to each week of the year accordingly to known values week "weights". If anybody works with countries, that have W99 data, then I would suggest to do this manually.

Apr 14 '21 19:04 justasmundeikis

If this is a common need, would it be feasible to have an additional enrichment function that could be run after data retrieval?

Apr 14 '21 19:04 antagomir

"The best way would be if Eurostat would divide the W99 values and assign them to each week of the year accordingly to known values week "weights". If anybody works with countries, that have W99 data, then I would suggest to do this manually." I completely agree. As a minimum solution, proportionally increasing all values would work in my opinion. (At least if the proportion of values reported for W99 is small compared to the total.)

Apr 15 '21 21:04 tamas-ferenci

I did some testing with the dataset mentioned here and I have to say fixing this weekly data issue was easier than figuring out how to efficiently handle this dataset with 110 million row (after pivot_longer). 16 GB of RAM wasn't apparently enough the way it was done before. The results are in commit cfdaf37 of the v4-dev branch (version 4.0.0.9002).

Based on the discussion here I couldn't figure out a sensible solution to W99 values. Drop it? Assign it to the last day of the year? Distribute the values evenly for the whole year? In my solution I coerced it to the first day of the first week of the year and the function prints a warning message for the user, suggesting to use get_eurostat(time_format = "raw") if they wish to wrangle the data manually. Might not be optimal and I'd love to hear your thoughts on the matter.

Jun 28 '23 13:06 pitkant

Closed with the CRAN release of package version 4.0.0

Dec 20 '23 08:12 pitkant

eurostat eurostat copied to clipboard

Problem with week time code

eurostat
eurostat copied to clipboard