"D:\foo" should be parsed as "file:///D:/foo"
https://quuz.org/url/liveview.html#D:/foo Edge and Chrome on Windows at least parse this as a file URL, which I think is much more friendly. Firefox does not, but has some special logic so that when you enter D:\foo in the URL bar, it translates it to file:///D:/foo.
They also parse https://quuz.org/url/liveview.html#D:b/foo as a file URL, so it's not about the path name starting with /... maybe they treat all single-character schemes this way?
Discovered in https://github.com/nodejs/node-eps/issues/51#issuecomment-285748905 by @jkrems
For the record, the address bar is out-of-scope.
I guess allowing this basically means giving up on single-code-point schemes, indeed. Not sure what the right trade-off is there.
On the upside no such schemes are registered at http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml but nothing is currently prohibiting that either.
httparchive
SELECT * FROM (
SELECT page, url, REGEXP_EXTRACT(LOWER(body), r'(<[a-z][^>]+\s(?:src|href)\s*=\s*["\']?[a-z]:/[^>]+>)') AS match
FROM [httparchive:har.2017_01_15_chrome_requests_bodies]
WHERE page = url
) WHERE match != "null"
Row page url match
1 http://www.xm-n-tax.gov.cn/ http://www.xm-n-tax.gov.cn/ <img src="d:/piaochuang/piaochuang.jpg" width="150px" height="90px;" onclick="javascript:window.open('/content/n4676.html');"/>
Page has changed.
2 http://www.newsforshoppers.com/ http://www.newsforshoppers.com/ <link href="s://plus.google.com/102103991664781080361" rel="publisher" />
rel="publisher" has no effect for browsers
3 http://www.aaai.org/ http://www.aaai.org/ <script src=s://seal.verisign.com/getseal?host_name=www.aaai.org&size=s&use_flash=no&use_transparent=no&lang=en>
4 http://www.mathematichka.ru/ http://www.mathematichka.ru/ <base href="d:/mathematichka/web/">
These are commented out.
Possibly there is content such as documentation on CDs that rely on this? Maybe a use counter could help?
Well, the URL parser should be generally applicable ideally, also beyond browsers. Part of the reason we're doing this is so that non-browsers can still browse the web.
Sure, I was just trying to find out if there were strong compat reasons for browsers to behave one way or the other for such URLs. I think there isn't, for publicly-accessible web content at least.
Actually, we could maybe support this by branching on the backslash, which is normally non-conforming and doesn't occur in the examples above.
Oops, the query only looked for forward slash. New query. Also removed the WHERE page = url which was limiting to top-level resources.
SELECT * FROM (
SELECT page, url, REGEXP_EXTRACT(LOWER(body), r'(<[a-z][^>]+\s(?:src|href)\s*=\s*["\']?[a-z]:[/\\][^>]+>)') AS match
FROM [httparchive:har.2017_01_15_chrome_requests_bodies]
) WHERE match != "null"
22 rows. https://gist.github.com/zcorpan/98a61be4877858d3de18c19d8939a3be
Looks mostly like errors (and stuff that won't work since we don't want http -> file to do anything but network error), but also all of those with backslash expect the behavior OP asks for I think.
I confirmed that this is a quirk IE6+/Chrome (on Windows only) have. They do it for both d:/foo and d:\foo. In fact, they do it for any a-z scheme. IE6 also does it for a 0-9 or -/+ scheme; I'll consider those to be bugs. (Firefox's address bar quirk is only with a backslash, not a forward slash.)
Thoughts on only adopting this when a backslash is used? Or should we add a platform-specific quirk here similar to https://w3c.github.io/FileAPI/#convert-line-endings-to-native and make single-scheme URLs impossible forever on that platform?
cc @sleevi @valenting @achristensen07 @jasnell
I'm -1 on platform-specific behavior (seems especially bad in contexts like HTTP servers and proxies).
I'm neutral on treating backslash specially vs. just treating all single-letter schemes as drive letters.
I'm +1 on addressing this in general. It would be great if full Windows file paths can be parsed as URLs as simply as passing them to the URL constructor.