node-ecstatic icon indicating copy to clipboard operation
node-ecstatic copied to clipboard

Symbolic links can cause incorrect response headers

Open battlesnake opened this issue 9 years ago • 14 comments

I logged this on http-server, but I'm guessing the actual response generation is done in ecstatic so I'll cross post:

https://github.com/indexzero/http-server/issues/213


To reproduce

echo '<!doctype html><html><head><meta charset="utf-8"><title>Test</title></head><body>Hello world</body></html>' > test.html
ln -s test.html test
http-server

Then go to http://localhost:8080/test

Expected behaviour

Server resolves target (realpath?) then prepares HTTP response based on target file

Content-Type: text/html

Observed behaviour

Server does not resolve target, causing incorrect HTTP response

Content-Type: application/octet-stream; charset=utf-8

battlesnake avatar Nov 29 '15 16:11 battlesnake

It's most likely generating the content-type based on the extension of 'test' and not 'test.html'. Which seems fine to me.

jfhbrook avatar Nov 29 '15 18:11 jfhbrook

Surely it makes sense for the content-type to depend on the file being served, rather than the URL used to access it? (since we do indeed serve the target file, not simply the contents of the symlink which would just be "test.html")

battlesnake avatar Nov 29 '15 18:11 battlesnake

How would you propose that I detect file type, then? The current implementation uses the extension of the thing you're trying to read (in this case the symlink).

jfhbrook avatar Nov 29 '15 19:11 jfhbrook

Resolve the path to the file before analysing the extension, possibly using fs.realpath

https://nodejs.org/docs/latest/api/fs.html#fs_fs_realpath_path_cache_callback

battlesnake avatar Nov 29 '15 19:11 battlesnake

Why is that better?

jfhbrook avatar Nov 29 '15 20:11 jfhbrook

To solve your specific use case, try moving the file to test/index.html and turn on autoindexing.

jfhbrook avatar Nov 29 '15 20:11 jfhbrook

I believe there's also a flag to default an extension if one is missing, this would also meet your use case

jfhbrook avatar Nov 29 '15 20:11 jfhbrook

Why is that better?

Because the format specified in the response is based on the file being served, rather than the path used to access it.

battlesnake avatar Nov 29 '15 21:11 battlesnake

I'm not entirely convinced that this is self-evident.

jfhbrook avatar Nov 30 '15 03:11 jfhbrook

UNIX traditionally used the first two bytes of a file's contents to identify it (which filesystems usually still store a copy of in the directory entry), but filename extensions are somewhat more widely used (e.g. in http-server). The file's format isn't any different when accessed via a symbolic link, any more than the first bytes of it are.

battlesnake avatar Nov 30 '15 10:11 battlesnake

Because the format specified in the response is based on the file being served, rather than the path used to access it.

That "is" is a matter of configuration. I find it rather useful for some courses to have an example.svg.txt symlink to example.svg and have my Apache serve a text or an image depending on the path used to access it. Used in this way, I wouldn't consider my Apache's response headers as "incorrect", and I expect to be able to pull the same trick with ecstatic.

UNIX traditionally used the first two bytes of a file's contents to identify it

Sounds like a light-weight version of MIME magic. For that, see (and solve?) #66 .

mk-pmb avatar Dec 08 '15 09:12 mk-pmb

Just tested the fs.realpath option and it seems to work:

$ ln -s index.html test
$  node -e 'var fs = require("fs"); console.log(require("mime").lookup(fs.realpathSync("test")));'
text/html

Sadly, that's not a fix for Windows folks...but then...they probably know that already. 😏

BigBlueHat avatar Jun 09 '17 18:06 BigBlueHat

There cannot be a fix because it's a config issue. If a chain of one or more symlinks is involved and you want content type to be guessed by filename, you'll have to tell your webserver which end of the symlink chain to use for guessing.

I gave the SVG as text example above, and I'll add some more use cases:

  • .bmp (image/x-ms-bmp) masquerading as .ico (image/vnd.microsoft.icon)
  • .exe (application/vnd.microsoft.portable-executable) masquerading as .zip (application/zip)
  • .tsv (text/tab-separated-values) masquerading as .xls (application/vnd.ms-excel)

If you want any file system lookup, e.g. to resolve the symlink target, it should be async. (Thus fs.realpathSync is a really bad idea; for explanation why, please ask in the general node help.) My request for async mime type lookup is in issue #66. If you solve that one, I'm sure we can have some guessMimeFromFinalSymlinkTarget option soon after because it will be trivial then.

mk-pmb avatar Jun 09 '17 21:06 mk-pmb

UNIX traditionally used the first two bytes of a file's contents to identify it (which filesystems usually still store a copy of in the directory entry), but filename extensions are somewhat more widely used (e.g. in http-server). The file's format isn't any different when accessed via a symbolic link, any more than the first bytes of it are.

@battlesnake ecstatic does already detect gzip'ed file based on the first two bytes of a file. You could expand that to cover all file types. But then you should ditch the mime package, since it uses the Apache project file extension tech.

But @mk-pmb has a good point about changing mime-type based on file-extensions via symlink.

In the end you have to choose if you want file content or file path to dictate the mime-type. Of course ecstatic got you covered with extensible custom mime types based on file path. So you can already create a config that get you to where you want.. more or less.

dotnetCarpenter avatar Oct 18 '17 07:10 dotnetCarpenter