CRASH/COPA caveat
Hi:
This is great stuff - but I'm concerned someone unsure of the background of STATS19 data collection might come to incorrect conclusions, specifically around changes in Serious casualties over time, and in recent-year analysis of spatial differences in Serious casualties.
There are new data collection methods, compared to paper, in this data now:
- CRASH (a DfT promoted mobile app for police)
- COPA (a Met Police mobile system)
- Online public submissions
(http://roadsafetyanalysis.org/2017/09/2016-gb-casualty-data-released/)
Not all constabularies will be on CRASH/COPA, but they will be showing rises in Serious casualties relative to previous years, to some degree because these apps force those entering data to enter data more precisely. (Many who ought to don't know what constitutes a "serious"). This can easily be mis-read and encourage false conclusions.
I'd suggest some sort of warning either as the package loads, or for results including 2016+ data in the first instance.
I'd also be happy to hunt down a list of Police Forces and when/if they switched, so that you could add another field ("data_entry_type" or similar). One could then adjust serious totals as appropriate to make analysis across space or time more robust.
Ivo
Hello Ivo,
Thank you for opening the ticket. This is important and worth followup. I will break down your post into few points as I understand it:
- Potential warning message along with the disclaimer currently in, would be related to data post 2016+.
- You kindly want to contribute by hunting down those that are already on CRASH/COPA.
- Extra field in
stats19::formatto includedata_entry_type
I just want to say: I am not 100% clear if there are different datasets released by the DfT according to their methods of collection. I think this is something that we need to clarify with DfT and right from the source. Otherwise (3) would be redundant.
The link in Ivo's post contains a link to this: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/744077/reported-road-casualties-annual-report-2017.pdf
RE (2) the report full report does contain this

I can think of the following ways to address this in the near term:
- mentioning it in the docs
- adding a new variable to
police_boundarieswith time of switch - potentially, adding something in the package load message
Sound like a plan? This should help reduce the chances of people arriving at false conclusions due to the different uptake times of CRASH. A PR adding an additional column to the existing police_boundary data, building on the var names shown below, would be greatly appreciated.
names(stats19::police_boundaries)
#> [1] "pfa16cd" "pfa16nm" "geometry"
Created on 2019-02-26 by the reprex package (v0.2.1)
My understanding of the switch is that it affects the serious/slight proportion but not the fatalities data. Is that correct? And any ideas how others are dealing with this?
In summary: definitely in favour of adding something on this, had heard about it but knew little about it. Thanks for raising the issue.
While the app-based systems are markedly superior in principle, there are transition issues, and not everyone has taken it up or taken up the same system or even the same version of the same system. Serious are now much more precisely counted, because the app asks about injury type, whereas the paper form required you to remember the definition. The app-based methods should have much more accurate crash location data, but the processing so far hasn't been kind to casualty home location and driver home location fields. This should improve and be backdated (the right data is there in the computer, it just isn't spitting it out at the moment).
Your plan sounds excellent, @Robinlovelace , and I can have a word on the side about it at the next STATS19 review meeting at DfT if that'll help (i) clarify any issues and/or (ii) drum up further interest.
My understanding is that the plan is that data entry method will begin to appear as an additional field, especially as new public-submitted data is likely to make this much more confusing soon.
Great to hear Ivo. Note: we have talked to DfT about this package and it has been informally tested by them (see #5). Anything mentioning those issues, especially based on expertise of the likes of Craig (do you know his GH handle? ; ) and others in Agilysis, will go well beyond mention of it in the current default open access system I believe! Look forward to seeing your input and if we can help in anyway (e.g. extracting data from an impenetrable pdf) just ping me here.
Are we closing this?
No I think we need to get #176 and #178 before closing this.
Cc @stholder3 FYI