core
core copied to clipboard
Case sensitive issue with setcontent
Hi, I am not sure but i think they are an issue with case sensitive on where in setcontent
Details
Question | Answer |
---|---|
Relevant Bolt Version | 4.1.5 |
Install type | Composer install |
PHP version | 7.4.2 |
Web server | Symfony 5.1.8 |
MySQL version | mysql 5.7.26 - Localhost via UNIX socket (with JSON) |
Operating System | Darwin - 20.1.0 |
Reproduction
Steps to reproduce
Create a contenttype named "pages" with field name "title" as type text
pages:
name: Pages
singular_name: Page
fields:
title:
type: text
required: true
class: large
group: content
teaser:
type: textarea
postfix: |
A text-only version of the content, that can be used to "tease"
the content when it's used in listing pages or search results.
image:
type: image
label: Highlighted image
body:
type: redactor
height: 300px
slug:
type: slug
uses: title
group: Meta
template:
type: templateselect
filter: '/^[^_].*\.twig$/'
taxonomy: [ groups ]
records_per_page: 5
order: id
Create entries with title as "Test" and "Test2" and "Another Test" In your template do a setcontent like this
{% setcontent pages = 'pages' where {title: '%Test%'} %} {{ dump(pages) }}
Expected result
I should have 3 results for:
- Test
- Test2
- Another Test
Actual result
Actually i have 0 results
Another test
If you entries with title as "test" and "test2" and "Another test" (In lowercase), it's work
As additional info I think it's useful to know what the collation of your mysql tables is.
You can easily find your table collations using:
SHOW TABLE STATUS FROM your_db_name;
I'm assuming you didn't specify a specific collation yourself, and that the columns of those tables have the same collation.
Hi @simongroenewolt I just check and all my tables are in utf8mb4_unicode_ci
If it's can help.
So, this might be interesting information. In a fresh bolt/project setup (using sqlite) I did a
{% setcontent pages = 'pages' where {title: '%Test%'} printquery %}
And this was the output:
SELECT content FROM Bolt\Entity\Content content LEFT JOIN content.fields fields_title LEFT JOIN fields_title.translations translations_title WHERE content.contentType = :ct0 AND ((JSON_EXTRACT(translations_title.value, '$[0]') LIKE :title_1 AND fields_title.parent IS NULL AND fields_title.name = :field_title) AND content.status = :status_1) ORDER BY content.publishedAt DESC
ct0: pages
title_1: %test%
status_1: published
field_title: title
Notice the title_1 field being all lowercase?
Now your tables are using a _ci
collation which should mean case insensitive therefore I'd expect both 'Test' and 'test' to match even for a query that was all lowercase.
Note: this is a result using sqlite, so it might be totally unrelated.
Additional insight:
- the data is in json columns, this might be related/relevant: https://stackoverflow.com/questions/49782240/can-i-do-case-insensitive-search-with-json-extract-in-mysql
- there is code lowercasing parameters, I don't know why, and I also don't know if this has an effect on the search query https://github.com/bolt/core/blob/9448fc9be42ee4dcb62f6c8699f90a7e0904ff68/src/Storage/SelectQuery.php#L179
So, this might be interesting information. In a fresh bolt/project setup (using sqlite) I did a
{% setcontent pages = 'pages' where {title: '%Test%'} printquery %}
And this was the output:
SELECT content FROM Bolt\Entity\Content content LEFT JOIN content.fields fields_title LEFT JOIN fields_title.translations translations_title WHERE content.contentType = :ct0 AND ((JSON_EXTRACT(translations_title.value, '$[0]') LIKE :title_1 AND fields_title.parent IS NULL AND fields_title.name = :field_title) AND content.status = :status_1) ORDER BY content.publishedAt DESC ct0: pages title_1: %test% status_1: published field_title: title
Notice the title_1 field being all lowercase?
Now your tables are using a
_ci
collation which should mean case insensitive therefore I'd expect both 'Test' and 'test' to match even for a query that was all lowercase.Note: this is a result using sqlite, so it might be totally unrelated.
I have try same with MySQL and it's output
SELECT content FROM Bolt\Entity\Content content LEFT JOIN content.fields fields_title LEFT JOIN fields_title.translations translations_title WHERE content.contentType = :ct0 AND ((JSON_EXTRACT(translations_title.value, '$[0]') LIKE :title_1 AND fields_title.parent IS NULL AND fields_title.name = :field_title) AND content.status = :status_1) ORDER BY content.publishedAt DESC
ct0: pages
title_1: %test%
status_1: published
field_title: title
So it's like to be same between sqlite and MySQL
Additional insight:
- the data is in json columns, this might be related/relevant: https://stackoverflow.com/questions/49782240/can-i-do-case-insensitive-search-with-json-extract-in-mysql
- there is code lowercasing parameters, I don't know why, and I also don't know if this has an effect on the search query https://github.com/bolt/core/blob/9448fc9be42ee4dcb62f6c8699f90a7e0904ff68/src/Storage/SelectQuery.php#L179
I think we are in good way I have do some tests
SELECT * FROM bolt_content AS c
LEFT JOIN bolt_field f ON (f.content_id = c.id)
LEFT JOIN bolt_field_translation ft ON (ft.translatable_id = f.id)
WHERE
c.content_type = 'pages' AND
(
(
JSON_EXTRACT(ft.value, '$[0]') LIKE '%test%' AND f.parent_id IS NULL AND f.name = 'title'
) AND
c.status = 'published'
)
ORDER BY c.published_at DESC
Actually if i run this, i have no results
But if i add LOWER() function like this
SELECT * FROM bolt_content AS c
LEFT JOIN bolt_field f ON (f.content_id = c.id)
LEFT JOIN bolt_field_translation ft ON (ft.translatable_id = f.id)
WHERE
c.content_type = 'pages' AND
(
(
LOWER(JSON_EXTRACT(ft.value, '$[0]')) LIKE '%test%' AND f.parent_id IS NULL AND f.name = 'title'
) AND
c.status = 'published'
)
ORDER BY c.published_at DESC
It's work well
https://github.com/bolt/core/blob/b6ba7a1b48a00bf954b5b08b27a58b5be587a02e/src/Doctrine/JsonHelper.php#L34
Maybe we should enclose JSON_EXTRACT function into LOWER function since the parameters are already forced in lowercase
I faced the same issue with search in Bolt 5. The problem is indeed with JSON_EXTRACT. MySQL documentation clearly states:
MySQL handles strings used in JSON context using the utf8mb4 character set and utf8mb4_bin collation. Strings in other character sets are converted to utf8mb4 as necessary. (For strings in the ascii or utf8 character sets, no conversion is needed because ascii and utf8 are subsets of utf8mb4.)
https://dev.mysql.com/doc/refman/8.0/en/json.html
So, the extracted value is in utf8mb4 which is case sensitive.
Another solution for this issue can be adding COLLATE clause after LIKE statement: COLLATE utf8mb4_general_ci
, eg.
WHERE JSON_EXTRACT(`value`, '$[0]') LIKE '%word%' COLLATE utf8mb4_unicode_ci
https://dev.mysql.com/doc/refman/8.0/en/charset-collate.html
After this JSON_EXTRACT return value will be interpreted using the given collation. But I think this applies only for mysql, so lower function maybe more appropriate.
Hi @bobdenotter, this issue will be fixed in 5.2 ? (or it's maybe already fixed) Regards
Hi guys, Any news about this?
Regards