Slugs/URIs with Accented Characters Cause Inconsistent URLs and 404 Errors (French Language)
Problem Description
When the platform is used in French, page titles, profile names, and other content may contain accented characters (e.g., é, à, ô, ç, etc.). Currently, the slug/URI generation logic does not properly transliterate these characters. This leads to two major issues:
- Inconsistent URLs: • Sometimes, URLs containing accents are accessible (e.g., /page/élève). • Other times, the URLs become extremely long or malformed (due to double encoding or browser misinterpretation), resulting in 404 errors.
- User Experience: • Users expect URLs to be clean and accessible regardless of accents. • Browsers and crawlers may fail to resolve URLs with non-ASCII characters, impacting SEO and usability.
Technical Analysis
The root cause is in the uriFilter function in utils.inc.php:
function uriFilter ($s, $aParams = [])
{
$sEmpty = isset($aParams['empty']) ? $aParams['empty'] : '-';
$sDivider = isset($aParams['divider']) ? $aParams['divider'] : '-';
if(BxTemplConfig::getInstance()->bAllowUnicodeInPreg)
$s = get_mb_replace ('/[^\pL^\pN^_]+/u', $sDivider, $s); // unicode characters
else
$s = get_mb_replace ('/([^\d^\w]+)/u', $sDivider, $s); // latin characters only
$s = get_mb_replace ('/([' . $sDivider . '^]+)/', $sDivider, $s);
$s = get_mb_replace ('/([' . $sDivider . ']+)$/', '', $s); // remove trailing dash
if(!$s)
$s = $sEmpty;
return !isset($aParams['lowercase']) || $aParams['lowercase'] === true ? mb_strtolower($s) : $s;
}
Issue: There is no transliteration step to convert accented characters to their ASCII equivalents (e.g., é → e). As a result, slugs may contain raw UTF-8 characters, which are not always handled consistently by browsers or web servers.
Steps to Reproduce
- Set UNA CMS to French.
- Create a profile or page with a name/title containing accents (e.g., Élève très appliqué).
- Observe the generated URL: • Sometimes it works (with accents in the URL). • Sometimes it results in a 404 or a very long, encoded URL.
Proposed Patch
Add a transliteration step using iconv at the start of the uriFilter function:
function uriFilter ($s, $aParams = [])
{
// Transliterate accented characters to ASCII
$s = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $s);
$sEmpty = isset($aParams['empty']) ? $aParams['empty'] : '-';
$sDivider = isset($aParams['divider']) ? $aParams['divider'] : '-';
if(BxTemplConfig::getInstance()->bAllowUnicodeInPreg)
$s = get_mb_replace ('/[^\pL^\pN^_]+/u', $sDivider, $s); // unicode characters
else
$s = get_mb_replace ('/([^\d^\w]+)/u', $sDivider, $s); // latin characters only
$s = get_mb_replace ('/([' . $sDivider . '^]+)/', $sDivider, $s);
$s = get_mb_replace ('/([' . $sDivider . ']+)$/', '', $s); // remove trailing dash
if(!$s)
$s = $sEmpty;
return !isset($aParams['lowercase']) || $aParams['lowercase'] === true ? mb_strtolower($s) : $s;
}
This ensures all accented characters are converted to their closest ASCII equivalents before further processing.
Expected Result
• URLs are always ASCII-only, clean, and accessible. • No more 404 errors or browser misinterpretation due to accents. • Improved SEO and user experience.