core icon indicating copy to clipboard operation
core copied to clipboard

Case sensitive issue with setcontent

Open Fredxd opened this issue 4 years ago • 9 comments

Hi, I am not sure but i think they are an issue with case sensitive on where in setcontent

Details

Question Answer
Relevant Bolt Version 4.1.5
Install type Composer install
PHP version 7.4.2
Web server Symfony 5.1.8
MySQL version mysql 5.7.26 - Localhost via UNIX socket (with JSON)
Operating System Darwin - 20.1.0

Reproduction

Steps to reproduce

Create a contenttype named "pages" with field name "title" as type text

pages:
    name: Pages
    singular_name: Page
    fields:
        title:
            type: text
            required: true
            class: large
            group: content
        teaser:
            type: textarea
            postfix: |
                A text-only version of the content, that can be used to "tease"
                the content when it's used in listing pages or search results.
        image:
            type: image
            label: Highlighted image
        body:
            type: redactor
            height: 300px
        slug:
            type: slug
            uses: title
            group: Meta
        template:
            type: templateselect
            filter: '/^[^_].*\.twig$/'
    taxonomy: [ groups ]
    records_per_page: 5
    order: id

Create entries with title as "Test" and "Test2" and "Another Test" In your template do a setcontent like this

{% setcontent pages = 'pages' where {title: '%Test%'} %} {{ dump(pages) }}

Expected result

I should have 3 results for:

  • Test
  • Test2
  • Another Test

Actual result

Actually i have 0 results

Another test

If you entries with title as "test" and "test2" and "Another test" (In lowercase), it's work

Fredxd avatar Nov 17 '20 16:11 Fredxd

As additional info I think it's useful to know what the collation of your mysql tables is.

You can easily find your table collations using:

SHOW TABLE STATUS FROM your_db_name;

I'm assuming you didn't specify a specific collation yourself, and that the columns of those tables have the same collation.

simongroenewolt avatar Nov 17 '20 17:11 simongroenewolt

Hi @simongroenewolt I just check and all my tables are in utf8mb4_unicode_ci

If it's can help.

Fredxd avatar Nov 18 '20 07:11 Fredxd

So, this might be interesting information. In a fresh bolt/project setup (using sqlite) I did a

{% setcontent pages = 'pages' where {title: '%Test%'} printquery %}

And this was the output:

SELECT content FROM Bolt\Entity\Content content LEFT JOIN content.fields fields_title LEFT JOIN fields_title.translations translations_title WHERE content.contentType = :ct0 AND ((JSON_EXTRACT(translations_title.value, '$[0]') LIKE :title_1 AND fields_title.parent IS NULL AND fields_title.name = :field_title) AND content.status = :status_1) ORDER BY content.publishedAt DESC

    ct0: pages
    title_1: %test%
    status_1: published
    field_title: title

Notice the title_1 field being all lowercase?

Now your tables are using a _ci collation which should mean case insensitive therefore I'd expect both 'Test' and 'test' to match even for a query that was all lowercase.

Note: this is a result using sqlite, so it might be totally unrelated.

simongroenewolt avatar Nov 18 '20 22:11 simongroenewolt

Additional insight:

  • the data is in json columns, this might be related/relevant: https://stackoverflow.com/questions/49782240/can-i-do-case-insensitive-search-with-json-extract-in-mysql
  • there is code lowercasing parameters, I don't know why, and I also don't know if this has an effect on the search query https://github.com/bolt/core/blob/9448fc9be42ee4dcb62f6c8699f90a7e0904ff68/src/Storage/SelectQuery.php#L179

simongroenewolt avatar Nov 18 '20 23:11 simongroenewolt

So, this might be interesting information. In a fresh bolt/project setup (using sqlite) I did a

{% setcontent pages = 'pages' where {title: '%Test%'} printquery %}

And this was the output:

SELECT content FROM Bolt\Entity\Content content LEFT JOIN content.fields fields_title LEFT JOIN fields_title.translations translations_title WHERE content.contentType = :ct0 AND ((JSON_EXTRACT(translations_title.value, '$[0]') LIKE :title_1 AND fields_title.parent IS NULL AND fields_title.name = :field_title) AND content.status = :status_1) ORDER BY content.publishedAt DESC

    ct0: pages
    title_1: %test%
    status_1: published
    field_title: title

Notice the title_1 field being all lowercase?

Now your tables are using a _ci collation which should mean case insensitive therefore I'd expect both 'Test' and 'test' to match even for a query that was all lowercase.

Note: this is a result using sqlite, so it might be totally unrelated.

I have try same with MySQL and it's output

SELECT content FROM Bolt\Entity\Content content LEFT JOIN content.fields fields_title LEFT JOIN fields_title.translations translations_title WHERE content.contentType = :ct0 AND ((JSON_EXTRACT(translations_title.value, '$[0]') LIKE :title_1 AND fields_title.parent IS NULL AND fields_title.name = :field_title) AND content.status = :status_1) ORDER BY content.publishedAt DESC
ct0: pages
title_1: %test%
status_1: published
field_title: title

So it's like to be same between sqlite and MySQL

Fredxd avatar Nov 19 '20 07:11 Fredxd

Additional insight:

  • the data is in json columns, this might be related/relevant: https://stackoverflow.com/questions/49782240/can-i-do-case-insensitive-search-with-json-extract-in-mysql
  • there is code lowercasing parameters, I don't know why, and I also don't know if this has an effect on the search query https://github.com/bolt/core/blob/9448fc9be42ee4dcb62f6c8699f90a7e0904ff68/src/Storage/SelectQuery.php#L179

I think we are in good way I have do some tests

SELECT * FROM bolt_content AS c 
LEFT JOIN bolt_field f ON (f.content_id = c.id)
LEFT JOIN bolt_field_translation ft ON (ft.translatable_id = f.id)
WHERE 
	c.content_type = 'pages' AND 
    (
        (
            JSON_EXTRACT(ft.value, '$[0]') LIKE '%test%' AND f.parent_id IS NULL AND f.name = 'title'
        ) AND 
        c.status = 'published'
    ) 
ORDER BY c.published_at DESC

Actually if i run this, i have no results

But if i add LOWER() function like this

SELECT * FROM bolt_content AS c 
LEFT JOIN bolt_field f ON (f.content_id = c.id)
LEFT JOIN bolt_field_translation ft ON (ft.translatable_id = f.id)
WHERE 
	c.content_type = 'pages' AND 
    (
        (
            LOWER(JSON_EXTRACT(ft.value, '$[0]')) LIKE '%test%' AND f.parent_id IS NULL AND f.name = 'title'
        ) AND 
        c.status = 'published'
    ) 
ORDER BY c.published_at DESC

It's work well

https://github.com/bolt/core/blob/b6ba7a1b48a00bf954b5b08b27a58b5be587a02e/src/Doctrine/JsonHelper.php#L34

Maybe we should enclose JSON_EXTRACT function into LOWER function since the parameters are already forced in lowercase

Fredxd avatar Nov 19 '20 07:11 Fredxd

I faced the same issue with search in Bolt 5. The problem is indeed with JSON_EXTRACT. MySQL documentation clearly states:

MySQL handles strings used in JSON context using the utf8mb4 character set and utf8mb4_bin collation. Strings in other character sets are converted to utf8mb4 as necessary. (For strings in the ascii or utf8 character sets, no conversion is needed because ascii and utf8 are subsets of utf8mb4.)

https://dev.mysql.com/doc/refman/8.0/en/json.html

So, the extracted value is in utf8mb4 which is case sensitive.

Another solution for this issue can be adding COLLATE clause after LIKE statement: COLLATE utf8mb4_general_ci, eg.

WHERE JSON_EXTRACT(`value`, '$[0]') LIKE '%word%' COLLATE utf8mb4_unicode_ci

https://dev.mysql.com/doc/refman/8.0/en/charset-collate.html

After this JSON_EXTRACT return value will be interpreted using the given collation. But I think this applies only for mysql, so lower function maybe more appropriate.

ntomka avatar Jun 14 '21 19:06 ntomka

Hi @bobdenotter, this issue will be fixed in 5.2 ? (or it's maybe already fixed) Regards

Fredxd avatar Oct 10 '22 14:10 Fredxd

Hi guys, Any news about this?

Regards

Fredxd avatar Jun 22 '23 06:06 Fredxd