doc-en icon indicating copy to clipboard operation
doc-en copied to clipboard

Used libmagic versions are not documented

Open mind-bending-forks opened this issue 3 years ago • 4 comments

Description

Prior to PHP 8, the finfo extension would always report that files with the .txt extension had a mime type of text/plain (I think). Certainly, files that had contents that looked like CSV would return a mime type of text/plain.

From PHP 8.0.1 to 8.0.25, the finfo extension returns a mime type of application/csv if the file contains something that looks like CSV.

From PHP 8.1.0+, the finfo extension returns a mime type of text/csv if the file contains something that looks like CSV.

This is demonstrated at https://3v4l.org/tmGQe

I'm not sure if this change of behaviour is deliberate or unintentional.

  • If deliberate, then it is a breaking change that ought to be documented, but I cannot find any mention of it in any of the release notes or the php documentation for finfo. (So, in that case, I have misfiled this report. It ought to be a documentation bug.)
  • If unintentional, then I note that there are potential security implications if a file that advertises itself via its extension as a plain text file is treated as something different because finfo has looked inside it and decided it is something else. Therefore, someone should review whether the previous behaviour ought to be reverted.

PHP Version

PHP 7.x - 8.x

Operating System

Ubuntu 20.04 - 22.04

mind-bending-forks avatar Nov 04 '22 17:11 mind-bending-forks

This is probably due to an upstream change; PHP 8.0 ships with libmagic 5.39 while PHP 8.1 has 5.40.

cmb69 avatar Nov 04 '22 18:11 cmb69

cmb@ELEPHPANT:~$ cd file-5.37
cmb@ELEPHPANT:~/file-5.37$ ./bin/file --mime-type ../data.txt
../data.txt: text/plain
cmb@ELEPHPANT:~/file-5.37$ cd ../file-5.39
cmb@ELEPHPANT:~/file-5.39$ ./bin/file --mime-type ../data.txt
../data.txt: application/csv
cmb@ELEPHPANT:~/file-5.39$ cd ../file-5.40
cmb@ELEPHPANT:~/file-5.40$ ./bin/file --mime-type ../data.txt
../data.txt: text/csv
cmb@ELEPHPANT:~/file-5.40$ cd ../file-5.43
cmb@ELEPHPANT:~/file-5.43$ ./bin/file --mime-type ../data.txt
../data.txt: text/csv

So this is not related to PHP. If you think the behavior is bad, consider to report that upstream.

cmb69 avatar Nov 25 '22 14:11 cmb69

Hi @cmb69,

I had investigated and drafted a response to your previous comment, but forgotten to submit! Here it is:

Regarding the change of reported mime type between application/csv and text/csv between PHP 8.0.1 and 8.0.25, and so looking at changes between libmagic 5.39 and 5.40, I note that there's nothing about it in the finfo change log, but the change does show up in src/is_csv.c.

PHP 7.4(.33) appears to have shipped with libmagic 5.37: https://github.com/php/php-src/commits/php-7.4.33/ext/fileinfo/libmagic

Regarding the switch from treating files with the .txt extension containing CSV content as plain/text to sniffing them for CSV content and reporting a CSV mime-type, which occurred between PHP 7.4 and 8.0, and so looking at the changes between libmagic 5.37 and 5.39, it appears that even prior to 5.37, libmagic had various flags for controlling behaviour for text files (MAGIC_NO_CHECK_TEXT, MAGIC_NO_CHECK_TOKENS, MAGIC_NO_CHECK_JSON). Between 5.37 and 5.38, inspecting files for CSV content was added, implemented in file/funcs.c, and an additional constant was added to disable that, if so desired: MAGIC_NO_CHECK_CSV. I'm not familiar with the code, and so I'm not sure how the library was previously determining that such files were text/plain and whether the file extension was taken into consideration, and whether the introduced CSV detection functionality was intended to be run on files with a .txt extension, but this change does appear to explain the behaviour I observed.

So, I agree with your assessment that this change in behaviour is due to changes in finfo. The only thing I'd say from PHP's side is that I can not find anything in the migration guides at https://www.php.net/manual/en/migration80.php and https://www.php.net/manual/en/migration81.php to mention that the finfo library was updated (and could contain backwards incompatible changes.) I would perhaps have expected to find this in the Other Changes sections. Does this need a documentation bug to be filed?

As for whether there could be security implications, on reflection, I don't think there are any additional security risks associated with these specific changes, provided it is understood that finfo is reporting what the file looks like internally (not necessarily what it actually is or how it was intended to be treated/processed), and that appropriate precautions are taken. The text at https://www.php.net/manual/en/intro.fileinfo.php could perhaps be updated to acknowledge this. So, again, not sure whether a documentation bug should be filed for that?

mind-bending-forks avatar Nov 25 '22 16:11 mind-bending-forks

I agree that it makes sense to explicitly mention the libmagic versions we are using (in the manual proper, as well as in the migration guide). I don't think it makes much sense to go into the details about how libmagic works, though.

I'm going to transfer this ticket to doc-en. Feel free to provide a PR regarding the changes.

cmb69 avatar Nov 28 '22 18:11 cmb69