incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

Conversion from collation utf8mb4_0900_ai_ci into utf8_general_ci impossible for parameter

Open marcemv90 opened this issue 3 years ago • 15 comments

Describe the bug

Running a Github pipeline to register a repository that contains emojis on some of its pull request comments (and probably in other fields) the pipeline fails with the following error:

Error 3988: Conversion from collation utf8mb4_0900_ai_ci into utf8_general_ci impossible for parameter

To Reproduce

Steps to reproduce the behavior:

  1. Go to 'Pipelines > Create Pipeline Run'
  2. Click on 'Create Pipeline Run'
  3. Scroll down to until 'Github' is shown in Data Providers list
  4. Toggle on GitHub Data provider
  5. Enter repository owner and name for a repository that contains emojis on some of its comments, titles, etc.
  6. Click on 'Run Pipeline'
  7. Wait for the Pipeline to run, in our case it fails at 20%.

Expected behavior

Github scan succeeds with no errors.

Screenshots

image

image

Additional context

Similar errors are shown in the lake container logs:

time="2022-06-03 10:51:01" level=error msg=" [task service] task failed%!(EXTRA *errors.SubTaskError=Error 3988: Conversion from collation utf8mb4_0900_ai_ci into utf8_general_ci impossible for parameter)"
time="2022-06-03 10:51:01" level=error msg=" [pipeline service] [pipeline #21] run tasks failed: %!w(*errors.errorString=&{Error 3988: Conversion from collation utf8mb4_0900_ai_ci into utf8_general_ci impossible for parameter})"

We think it might be related to the emojis as the log also contains those. Probably is related to the charset used in the Database or some of its tables or fields.

Please let us know if further information is required.

Many thanks!

Regards.

marcemv90 avatar Jun 03 '22 11:06 marcemv90

@klesh @warren830 Can you take a look at this bug?

Startrekzky avatar Jun 06 '22 15:06 Startrekzky

@marcemv90 This is wield, I tried many times, it works well. image

Although the emoji was shown as ? in db like the pic below, but after we select them and print to a .csv, it was shown correctly like the pic above image

warren830 avatar Jul 01 '22 03:07 warren830

@marcemv90 can you provide the repo you met the problem?

warren830 avatar Jul 01 '22 03:07 warren830

😅

warren830 avatar Jul 01 '22 04:07 warren830

😅 also tried this one image

warren830 avatar Jul 01 '22 04:07 warren830

Hi @warren830 👋 ,

I can't provide the repository where that bug was found as it is a private repository from my company. Emojis becoming the cause of the error was our first thought, but we are not sure about that. That was something that simply made sense for us, but we have no further mechanisms to dig deeper.

What is something that I do can provide are the tags of the docker images that we are using, maybe that's helpful for your for troubleshooting, or maybe it is something that is already fixed in a higher version or something similar.

  • Lake: mericodev/lake:20220523
  • ConfigUI: We are using a Docker image built by us because there hasn't been a non-test release since #2238. But it is based on mericodev/config-ui:20220523.
  • Database: We are running a MySQL in AWS RDS. Engine version 8.0.27

Hope it can shed some light to help identifying the root cause.

Please let me know if I provide something else that might be helpful as well, like enabling debug traces and upload the logs of a failed scan, or something like that.

Regards!

marcemv90 avatar Jul 01 '22 06:07 marcemv90

I think the problem might be caused by AWS RDS. What's the collation of your db?

warren830 avatar Jul 04 '22 06:07 warren830

Hello @warren830 , see following:

use lake;
SELECT @@character_set_database, @@collation_database;

image

Also our connection string is set to the following, in case it could be helpful:

mysql://user:[email protected]:3306/lake?charset=utf8mb4&parseTime=True

Thanks for your time!

Regards.

marcemv90 avatar Jul 04 '22 08:07 marcemv90

Thanks!

warren830 avatar Jul 04 '22 08:07 warren830

Hello @warren830 , see following:

use lake;
SELECT @@character_set_database, @@collation_database;

image

Also our connection string is set to the following, in case it could be helpful:

mysql://user:[email protected]:3306/lake?charset=utf8mb4&parseTime=True

Thanks for your time!

Regards.

This is so strange, we are using the same character set and collation: image

Can you check which task returned error?

warren830 avatar Jul 21 '22 02:07 warren830

Hi @warren830, the task is using the Github plugin, not sure if that the information that you need. Let me know if I can provide further details.

marcemv90 avatar Jul 25 '22 16:07 marcemv90

Hi, @marcemv90, what was the name of the failing subtask? Can you isolate the specific "emoji" that causing the problem? Thanks in advanced.

klesh avatar Jul 26 '22 01:07 klesh

Hey @marcemv90 , have you found out which specific emojis cause this problem?

Startrekzky avatar Aug 09 '22 02:08 Startrekzky

Hello @klesh , @Startrekzky , sorry for the delayed response, I've been trying to reproduce the error, but I've been completely unable. I will let you know if it arises again, so I can provide further details.

Sorry.

marcemv90 avatar Aug 12 '22 08:08 marcemv90

@marcemv90 Hi Marcelo, were you able to reproduce the issue?

hezyin avatar Aug 26 '22 03:08 hezyin

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Sep 26 '22 00:09 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Oct 03 '22 00:10 github-actions[bot]