Conversion from collation utf8mb4_0900_ai_ci into utf8_general_ci impossible for parameter
Describe the bug
Running a Github pipeline to register a repository that contains emojis on some of its pull request comments (and probably in other fields) the pipeline fails with the following error:
Error 3988: Conversion from collation utf8mb4_0900_ai_ci into utf8_general_ci impossible for parameter
To Reproduce
Steps to reproduce the behavior:
- Go to 'Pipelines > Create Pipeline Run'
- Click on 'Create Pipeline Run'
- Scroll down to until 'Github' is shown in Data Providers list
- Toggle on GitHub Data provider
- Enter repository owner and name for a repository that contains emojis on some of its comments, titles, etc.
- Click on 'Run Pipeline'
- Wait for the Pipeline to run, in our case it fails at 20%.
Expected behavior
Github scan succeeds with no errors.
Screenshots


Additional context
Similar errors are shown in the lake container logs:
time="2022-06-03 10:51:01" level=error msg=" [task service] task failed%!(EXTRA *errors.SubTaskError=Error 3988: Conversion from collation utf8mb4_0900_ai_ci into utf8_general_ci impossible for parameter)"
time="2022-06-03 10:51:01" level=error msg=" [pipeline service] [pipeline #21] run tasks failed: %!w(*errors.errorString=&{Error 3988: Conversion from collation utf8mb4_0900_ai_ci into utf8_general_ci impossible for parameter})"
We think it might be related to the emojis as the log also contains those. Probably is related to the charset used in the Database or some of its tables or fields.
Please let us know if further information is required.
Many thanks!
Regards.
@klesh @warren830 Can you take a look at this bug?
@marcemv90 This is wield, I tried many times, it works well.

Although the emoji was shown as ? in db like the pic below, but after we select them and print to a .csv, it was shown correctly like the pic above

@marcemv90 can you provide the repo you met the problem?
😅
😅 also tried this one
Hi @warren830 👋 ,
I can't provide the repository where that bug was found as it is a private repository from my company. Emojis becoming the cause of the error was our first thought, but we are not sure about that. That was something that simply made sense for us, but we have no further mechanisms to dig deeper.
What is something that I do can provide are the tags of the docker images that we are using, maybe that's helpful for your for troubleshooting, or maybe it is something that is already fixed in a higher version or something similar.
- Lake: mericodev/lake:20220523
- ConfigUI: We are using a Docker image built by us because there hasn't been a non-test release since #2238. But it is based on mericodev/config-ui:20220523.
- Database: We are running a MySQL in AWS RDS. Engine version 8.0.27
Hope it can shed some light to help identifying the root cause.
Please let me know if I provide something else that might be helpful as well, like enabling debug traces and upload the logs of a failed scan, or something like that.
Regards!
I think the problem might be caused by AWS RDS. What's the collation of your db?
Hello @warren830 , see following:
use lake;
SELECT @@character_set_database, @@collation_database;

Also our connection string is set to the following, in case it could be helpful:
mysql://user:[email protected]:3306/lake?charset=utf8mb4&parseTime=True
Thanks for your time!
Regards.
Thanks!
Hello @warren830 , see following:
use lake; SELECT @@character_set_database, @@collation_database;
Also our connection string is set to the following, in case it could be helpful:
mysql://user:[email protected]:3306/lake?charset=utf8mb4&parseTime=TrueThanks for your time!
Regards.
This is so strange, we are using the same character set and collation:

Can you check which task returned error?
Hi @warren830, the task is using the Github plugin, not sure if that the information that you need. Let me know if I can provide further details.
Hi, @marcemv90, what was the name of the failing subtask? Can you isolate the specific "emoji" that causing the problem? Thanks in advanced.
Hey @marcemv90 , have you found out which specific emojis cause this problem?
Hello @klesh , @Startrekzky , sorry for the delayed response, I've been trying to reproduce the error, but I've been completely unable. I will let you know if it arises again, so I can provide further details.
Sorry.
@marcemv90 Hi Marcelo, were you able to reproduce the issue?
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.
