Investigate datasets with multiples of the same filetype
Stemming from Issue 55.
Currently, filelinks are labelled by their filetype not their filename. This means on a single page a user may see multiple "CSV CSV CSV" links with no context of what each link is. The outcome of Issue 55 will probably change the use of the filetype label to filename label.
This issue however is an investigation into how prevalent this issue actually is - it comes down to what different publishers consider to be a dataset. E.g. Some publishers may publish 1 year of data per dataset, others may publish multiple years as separate files in the same dataset.
If there is only 1 instance of a filetype to a dataset, it isn't such a problem because we can quite clearly relate the file to the dataset description, but datasets with multiples of the same filetypes are confusing to the user.
Investigate:
- how many datasets (and %) have multiples of same filetype
- what's the range of multiples?
- what's the pattern across publishers? years?
- what's behind these multiples?
Anything that can help us understand the situation more and prepare to accommodate future publishers will be helpful.
@jw-obrien is going to take a look at this