PhpSpreadsheet
PhpSpreadsheet copied to clipboard
Csv Reader Allow Use of mimetype=text/html Files Without Extension
Fix #4036. The issue was originally reported as #564 (and #811) and fixed for the most part, but this is a variation that was not covered by the original. Cells with html fragments can cause mime_content_type to identify the file as text\html. Original fix was to ignore mime_content_type when file extension is 'csv' or 'tsv'. However, if the file does not have one of those extensions, it will be rejected by Csv Reader as invalid mimetype. This PR adds text\html to the list of valid mimetypes.
I imagine that this type of problem might occur for other mimetypes. If any of those are reported in future, it might be better to just add a "suppress mimetype" check option, rather than extending the list forever. Html is unusual in that its rules are so lax, which is why it seems appropriate to add it here.
Note that IOFactory may still identify a file as Html even when intended as Csv. The sample associated with this issue does not fall into this category, but one of the unit tests on this ticket does. The file will still be read correctly by Csv Reader, but IOFactory load may cause it to use Html Reader instead.
This is:
- [x] a bugfix
- [ ] a new feature
- [ ] refactoring
- [ ] additional unit tests
Checklist:
- [x] Changes are covered by unit tests
- [x] Changes are covered by existing unit tests
- [x] New unit tests have been added
- [x] Code style is respected
- [x] Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
- [ ] CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
- [ ] Documentation is updated as necessary
Why this change is needed?
Provide an explanation of why this change is needed, with links to any Issues (if appropriate). If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.