Used libmagic versions are not documented
Description
Prior to PHP 8, the finfo extension would always report that files with the .txt extension had a mime type of text/plain (I think). Certainly, files that had contents that looked like CSV would return a mime type of text/plain.
From PHP 8.0.1 to 8.0.25, the finfo extension returns a mime type of application/csv if the file contains something that looks like CSV.
From PHP 8.1.0+, the finfo extension returns a mime type of text/csv if the file contains something that looks like CSV.
This is demonstrated at https://3v4l.org/tmGQe
I'm not sure if this change of behaviour is deliberate or unintentional.
- If deliberate, then it is a breaking change that ought to be documented, but I cannot find any mention of it in any of the release notes or the php documentation for
finfo. (So, in that case, I have misfiled this report. It ought to be a documentation bug.) - If unintentional, then I note that there are potential security implications if a file that advertises itself via its extension as a plain text file is treated as something different because
finfohas looked inside it and decided it is something else. Therefore, someone should review whether the previous behaviour ought to be reverted.
PHP Version
PHP 7.x - 8.x
Operating System
Ubuntu 20.04 - 22.04
This is probably due to an upstream change; PHP 8.0 ships with libmagic 5.39 while PHP 8.1 has 5.40.
cmb@ELEPHPANT:~$ cd file-5.37
cmb@ELEPHPANT:~/file-5.37$ ./bin/file --mime-type ../data.txt
../data.txt: text/plain
cmb@ELEPHPANT:~/file-5.37$ cd ../file-5.39
cmb@ELEPHPANT:~/file-5.39$ ./bin/file --mime-type ../data.txt
../data.txt: application/csv
cmb@ELEPHPANT:~/file-5.39$ cd ../file-5.40
cmb@ELEPHPANT:~/file-5.40$ ./bin/file --mime-type ../data.txt
../data.txt: text/csv
cmb@ELEPHPANT:~/file-5.40$ cd ../file-5.43
cmb@ELEPHPANT:~/file-5.43$ ./bin/file --mime-type ../data.txt
../data.txt: text/csv
So this is not related to PHP. If you think the behavior is bad, consider to report that upstream.
Hi @cmb69,
I had investigated and drafted a response to your previous comment, but forgotten to submit! Here it is:
Regarding the change of reported mime type between
application/csvandtext/csvbetween PHP 8.0.1 and 8.0.25, and so looking at changes between libmagic 5.39 and 5.40, I note that there's nothing about it in the finfo change log, but the change does show up insrc/is_csv.c.PHP 7.4(.33) appears to have shipped with libmagic 5.37: https://github.com/php/php-src/commits/php-7.4.33/ext/fileinfo/libmagic
Regarding the switch from treating files with the .txt extension containing CSV content as
plain/textto sniffing them for CSV content and reporting a CSV mime-type, which occurred between PHP 7.4 and 8.0, and so looking at the changes between libmagic 5.37 and 5.39, it appears that even prior to 5.37, libmagic had various flags for controlling behaviour for text files (MAGIC_NO_CHECK_TEXT,MAGIC_NO_CHECK_TOKENS,MAGIC_NO_CHECK_JSON). Between 5.37 and 5.38, inspecting files for CSV content was added, implemented in file/funcs.c, and an additional constant was added to disable that, if so desired:MAGIC_NO_CHECK_CSV. I'm not familiar with the code, and so I'm not sure how the library was previously determining that such files weretext/plainand whether the file extension was taken into consideration, and whether the introduced CSV detection functionality was intended to be run on files with a .txt extension, but this change does appear to explain the behaviour I observed.
So, I agree with your assessment that this change in behaviour is due to changes in finfo. The only thing I'd say from PHP's side is that I can not find anything in the migration guides at https://www.php.net/manual/en/migration80.php and https://www.php.net/manual/en/migration81.php to mention that the finfo library was updated (and could contain backwards incompatible changes.) I would perhaps have expected to find this in the Other Changes sections. Does this need a documentation bug to be filed?
As for whether there could be security implications, on reflection, I don't think there are any additional security risks associated with these specific changes, provided it is understood that finfo is reporting what the file looks like internally (not necessarily what it actually is or how it was intended to be treated/processed), and that appropriate precautions are taken. The text at https://www.php.net/manual/en/intro.fileinfo.php could perhaps be updated to acknowledge this. So, again, not sure whether a documentation bug should be filed for that?
I agree that it makes sense to explicitly mention the libmagic versions we are using (in the manual proper, as well as in the migration guide). I don't think it makes much sense to go into the details about how libmagic works, though.
I'm going to transfer this ticket to doc-en. Feel free to provide a PR regarding the changes.