Mail Collector: Issue with Parsing Emails Containing Multiple <body> Tags
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Is there an existing issue for this?
- [x] I have searched the existing issues
Version
10.0.18
Bug description
Title: Mail Collector: Issue with Parsing Emails Containing Multiple
TagsDescription:
Hello,
We're facing an issue with the Mail Collector component during the processing of incoming emails for ticket creation and responses. Over the past two months, users have started receiving empty tickets or replies.
Root Cause We traced the problem to a change in the structure of incoming emails. Our organization uses PhishAlarm, which recently started injecting additional HTML content, including its own
tag. This results in emails containing multiple tags.The current implementation of Mail Collector expects a single
tag, leading to incorrect parsing and loss of content.Current Code (Before):
$body_matches = [];
if (preg_match('/<body[^>]*>\s*(?<body>.+?)\s*<\/body>/is', $content, $body_matches) === 1) {
$content = $body_matches['body'];
}
This code only matches one
tag and assumes it's the right one.Temporary Fix (Now):
$body_matches = [];
if (preg_match('/<body[^>]*>\s*(?<body>.+?)\s*<\/body>/is', $content, $body_matches) === 2) {
$content = $body_matches['body'];
}
By changing the expected match count from 1 to 2, we can capture the second
tag, which in our case contains the actual message content. This workaround has restored correct behavior for now.Suggestion This fix is brittle and environment-specific. Instead, we think it's better that glpi can Support multiple
tagsA more resilient solution would help support email structures from various third-party tools without requiring manual code adjustments.
Thank you
Relevant log output
Page URL
No response
Steps To reproduce
No response
Your GLPI setup information
No response
Anything else?
No response
HTML with multiple body tags isn't valid HTML. If GLPI changes to guess how a malformed HTML document is intended to function, I think this will just lead to more issues and complexity.
This sounds more like an issue with the browser/email client extension.
if (preg_match('/<body[^>]*>\s*(?<body>.+?)\s*<\/body>/is', $content, $body_matches) === 2) { will never be true. It means that content will not be filtered.
I do not really know how we should handle the presence of multiple body tags. Maybe we should concat all the <body> contens into a single one.
This sounds more like an issue with the browser/email client extension.
I totally agree; a bug report should be send to "PhishAlarm".