url icon indicating copy to clipboard operation
url copied to clipboard

"D:\foo" should be parsed as "file:///D:/foo"

Open domenic opened this issue 9 years ago • 10 comments

https://quuz.org/url/liveview.html#D:/foo Edge and Chrome on Windows at least parse this as a file URL, which I think is much more friendly. Firefox does not, but has some special logic so that when you enter D:\foo in the URL bar, it translates it to file:///D:/foo.

They also parse https://quuz.org/url/liveview.html#D:b/foo as a file URL, so it's not about the path name starting with /... maybe they treat all single-character schemes this way?

Discovered in https://github.com/nodejs/node-eps/issues/51#issuecomment-285748905 by @jkrems

domenic avatar Mar 10 '17 18:03 domenic

For the record, the address bar is out-of-scope.

I guess allowing this basically means giving up on single-code-point schemes, indeed. Not sure what the right trade-off is there.

On the upside no such schemes are registered at http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml but nothing is currently prohibiting that either.

annevk avatar Mar 15 '17 08:03 annevk

httparchive

SELECT * FROM (
SELECT page, url, REGEXP_EXTRACT(LOWER(body), r'(<[a-z][^>]+\s(?:src|href)\s*=\s*["\']?[a-z]:/[^>]+>)') AS match
FROM [httparchive:har.2017_01_15_chrome_requests_bodies]
WHERE page = url
) WHERE match != "null"
Row	page	url	match	 
1	http://www.xm-n-tax.gov.cn/	http://www.xm-n-tax.gov.cn/	<img src="d:/piaochuang/piaochuang.jpg" width="150px" height="90px;" onclick="javascript:window.open('/content/n4676.html');"/>	 

Page has changed.

2	http://www.newsforshoppers.com/	http://www.newsforshoppers.com/	<link href="s://plus.google.com/102103991664781080361" rel="publisher" />	 

rel="publisher" has no effect for browsers

3	http://www.aaai.org/	http://www.aaai.org/	<script src=s://seal.verisign.com/getseal?host_name=www.aaai.org&size=s&use_flash=no&use_transparent=no&lang=en>	 
4	http://www.mathematichka.ru/	http://www.mathematichka.ru/	<base href="d:/mathematichka/web/">

These are commented out.

Possibly there is content such as documentation on CDs that rely on this? Maybe a use counter could help?

zcorpan avatar Mar 15 '17 11:03 zcorpan

Well, the URL parser should be generally applicable ideally, also beyond browsers. Part of the reason we're doing this is so that non-browsers can still browse the web.

annevk avatar Mar 15 '17 16:03 annevk

Sure, I was just trying to find out if there were strong compat reasons for browsers to behave one way or the other for such URLs. I think there isn't, for publicly-accessible web content at least.

zcorpan avatar Mar 16 '17 09:03 zcorpan

Actually, we could maybe support this by branching on the backslash, which is normally non-conforming and doesn't occur in the examples above.

annevk avatar Mar 22 '17 13:03 annevk

Oops, the query only looked for forward slash. New query. Also removed the WHERE page = url which was limiting to top-level resources.

SELECT * FROM (
SELECT page, url, REGEXP_EXTRACT(LOWER(body), r'(<[a-z][^>]+\s(?:src|href)\s*=\s*["\']?[a-z]:[/\\][^>]+>)') AS match
FROM [httparchive:har.2017_01_15_chrome_requests_bodies]
) WHERE match != "null"

22 rows. https://gist.github.com/zcorpan/98a61be4877858d3de18c19d8939a3be

zcorpan avatar Mar 22 '17 13:03 zcorpan

Looks mostly like errors (and stuff that won't work since we don't want http -> file to do anything but network error), but also all of those with backslash expect the behavior OP asks for I think.

annevk avatar Mar 22 '17 13:03 annevk

I confirmed that this is a quirk IE6+/Chrome (on Windows only) have. They do it for both d:/foo and d:\foo. In fact, they do it for any a-z scheme. IE6 also does it for a 0-9 or -/+ scheme; I'll consider those to be bugs. (Firefox's address bar quirk is only with a backslash, not a forward slash.)

Thoughts on only adopting this when a backslash is used? Or should we add a platform-specific quirk here similar to https://w3c.github.io/FileAPI/#convert-line-endings-to-native and make single-scheme URLs impossible forever on that platform?

cc @sleevi @valenting @achristensen07 @jasnell

annevk avatar May 10 '20 16:05 annevk

I'm -1 on platform-specific behavior (seems especially bad in contexts like HTTP servers and proxies).

I'm neutral on treating backslash specially vs. just treating all single-letter schemes as drive letters.

I'm +1 on addressing this in general. It would be great if full Windows file paths can be parsed as URLs as simply as passing them to the URL constructor.

domenic avatar May 10 '20 16:05 domenic