csvlint.rb icon indicating copy to clipboard operation
csvlint.rb copied to clipboard

Improve validation of URIs

Open ldodds opened this issue 11 years ago • 5 comments

ldodds avatar Feb 18 '14 12:02 ldodds

https://github.com/sporkmonger/addressable looks promising

pezholio avatar Feb 18 '14 12:02 pezholio

For https://github.com/theodi/shared/issues/160

Floppy avatar Feb 24 '14 14:02 Floppy

The goal here was to try and improve the validation around URIs.

Currently the code use URI.parse. This will catch some errors but also lets through some values which probably shouldn't be treated as a URI. For example it parses any string as a valid relative URI. Looking again at the definition of xsd:anyURI that might be fine.

We also check to see whether its a http or https URI. This was an attempt to improve things, but may be overly limiting.

So the issue was to decide whether we wanted to keep what we are doing or improve things based on expected use cases for URIs in CSV data.

ldodds avatar Mar 25 '14 13:03 ldodds

Hmmm... Yeah, I see what you mean now. Looking at the spec, I think you're right, a xsd:anyURI defines a URI to be relative or absolute, so I think what we have is actually fine. We should probably get rid of the checking for http or https too.

pezholio avatar Mar 25 '14 14:03 pezholio

I've been giving this a bit more thought, and I think we should leave it as is. In most (if not all) instances, people are going to be using absolute URIs, and if we change it so it includes relative URIs, it'll match pretty much everything as a URI, which will mean a CSV with columns that are mainly URIs, but with the odd line of (unspaced) text will validate correctly.

pezholio avatar Mar 26 '14 15:03 pezholio