incubator-devlake
incubator-devlake copied to clipboard
[Bug][SQL] URL column too short
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
My pipeline fails to get the latest pipeline run, as it cannot insert the URL into the database. Here is my URL, with some info replaced with X's but the length is still representative: https://dev.azure.com/ccccxxxxxxxxxxx/xxxxxx/_apis/build/builds?repositoryId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&repositoryType=tfsgit&deletedFilter=excludeDeleted&queryOrder=finishTimeDescending&api-version=7.0&continuationToken=2024-02-06T10%3A35%3A26.7061112Z
As you can see the URL is 264 chars long, while the database schema is 255 chars, making devlake throw an exception:
level=error msg=" [pipeline service] [pipeline #90] [task #1212] subtask collectAzuredevopsBuilds ended unexpectedly
Wraps:
(2) remote error response:
| ERROR: DataError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
| (MySQLdb.DataError) (1406, \"Data too long for column 'url' at row 1\")
| [SQL: INSERT INTO _raw_azuredevops_builds (params, data, url, input, created_at) VALUES (%!s(MISSING), %!s(MISSING), %!s(MISSING), %!s(MISSING), %!s(MISSING))]
| [parameters: ('{\"ConnectionId\":1,\"ScopeId\":\"d675bd99-5d22-4a11-a3dc-34fd9b9a4217\"}', b'{\"_links\": {\"self\": {\"href\": \"https://dev.azure.com/ccccxxxxxxxxxxx/d793494b-e38e-4d53-b9ed-205e83975e88/_apis/build/Builds/14146\"}, \"web\": {\"href\" ... (5391 characters truncated) ... mchart-cfg\", \"clean\": null, \"checkoutSubmodules\": false}, \"retainedByRelease\": false, \"triggeredByBuild\": null, \"appendCommitMessageToRunName\": true}', 'https://dev.azure.com/ccccxxxxxxxxxxx/xxxxxx/_apis/build/builds?repositoryId=d675bd99-5d22-4a11-a3dc-34fd9b9a4217&repositoryType=tfsgit&deletedFilter=excludeDeleted&queryOrder=finishTimeDescending&api-version=7.0&continuationToken=2024-02-08T13%!A(MISSING)15%!A(MISSING)08.7782905Z', b'{\"OrgId\": \"ccccxxxxxxxxxxx\", \"ProjectId\": \"xxxxx\", \"RepoId\": \"d675bd99-5d22-4a11-a3dc-34fd9b9a4217\", \"Provider\": \"tfsgit\"}', datetime.datetime(2024, 4, 24, 10, 8, 21, 412891))]
| (Background on this error at: https://sqlalche.me/e/14/9h9h)
| Traceback (most recent call last):
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py\", line 1900, in _execute_context
| self.dialect.do_execute(
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py\", line 736, in do_execute
| cursor.execute(statement, parameters)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/MySQLdb/cursors.py\", line 179, in execute
| res = self._query(mogrified_query)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/MySQLdb/cursors.py\", line 330, in _query
| db.query(q)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/MySQLdb/connections.py\", line 261, in query
| _mysql.connection.query(self, query)
| MySQLdb.DataError: (1406, \"Data too long for column 'url' at row 1\")
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File \"/app/python/plugins/azuredevops/azuredevops/main.py\", line 157, in <module>
| AzureDevOpsPlugin.start()
| File \"/app/python/pydevlake/pydevlake/plugin.py\", line 253, in start
| fire.Fire(PluginCommands(plugin))
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/fire/core.py\", line 141, in Fire
| component_trace = _Fire(component, args, parsed_flag_args, context, name)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/fire/core.py\", line 466, in _Fire
| component, remaining_args = _CallAndUpdateTrace(
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/fire/core.py\", line 681, in _CallAndUpdateTrace
| component = fn(*varargs, **kwargs)
| File \"/app/python/pydevlake/pydevlake/ipc.py\", line 58, in wrapper
| for each in ret:
| File \"/app/python/pydevlake/pydevlake/ipc.py\", line 73, in collect
| yield from self._plugin.collect(self._mk_context(ctx), stream)
| File \"/app/python/pydevlake/pydevlake/plugin.py\", line 111, in _run_stream
| yield from getattr(stream, subtask).run(ctx)
| File \"/app/python/pydevlake/pydevlake/subtasks.py\", line 81, in run
| raise e
| File \"/app/python/pydevlake/pydevlake/subtasks.py\", line 65, in run
| session.merge(subtask_run)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 2987, in merge
| self._autoflush()
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 2257, in _autoflush
| util.raise_(e, with_traceback=sys.exc_info()[2])
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/util/compat.py\", line 208, in raise_
| raise exception
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 2246, in _autoflush
| self.flush()
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 3386, in flush
| self._flush(objects)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 3526, in _flush
| transaction.rollback(_capture_exception=True)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py\", line 70, in __exit__
| compat.raise_(
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/util/compat.py\", line 208, in raise_
| raise exception
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py\", line 1900, in _execute_context
| self.dialect.do_execute(
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py\", line 736, in do_execute
| cursor.execute(statement, parameters)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib python3.9/site-packages/MySQLdb/cursors.py\", line 179, in execute
| res = self._query(mogrified_query)
| File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib python3.9/site-packages/MySQLdb/cursors.py\", line 330, in _query
| db.query(q)
| sqlalchemy.exc.DataError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
| (MySQLdb.DataError) (1406, \"Data too long for column 'url' at row 1\")
| [SQL: INSERT INTO _raw_azuredevops_builds (params, data, url, input, created_at) VALUES (%!s(MISSING), %!s(MISSING), %!s(MISSING), %!s(MISSING), %!s(MISSING))]
| [parameters: ('{\"ConnectionId\":1,\"ScopeId\":\"d675bd99-5d22-4a11-a3dc-34fd9b9a4217\"}', b'{\"_links\": {\"self\": {\"href\": \"https://dev.azure.com/ccccxxxxxxxxxxx/xxxxxx/_apis/build/Builds/14146\"}, \"web\": {\"href\" ... (5391 characters truncated) ... mchart-cfg\", \"clean\": null, \"checkoutSubmodules\": false}, \"retainedByRelease\": false, \"triggeredByBuild\": null, \"appendCommitMessageToRunName\": true}', 'https://dev.azure.com/ccccxxxxxxxxxxx/xxxxxx/_apis/build/builds?repositoryId=d675bd99-5d22-4a11-a3dc-34fd9b9a4217&repositoryType=tfsgit&deletedFilter=excludeDeleted&queryOrder=finishTimeDescending&api-version=7.0&continuationToken=2024-02-08T13%!A(MISSING)15%!A(MISSING)08.7782905Z', b'{\"OrgId\": \"ccccxxxxxxxxxxx\", \"ProjectId\": \"xxxxx\", \"RepoId\": \"d675bd99-5d22-4a11-a3dc-34fd9b9a4217\", \"Provider\": \"tfsgit\"}', datetime.datetime(2024, 4, 24, 10, 8, 21, 412891))]
Wraps:
(3) exit status 1
(4) exit status 1
Error types:
(1) *hintdetail.withDetail
(2) *hintdetail.withDetail
(3) *hintdetail.withDetail
(4) *exec.ExitError"
What do you expect to happen
I would expect either a much longer char column or a varchar
How to reproduce
Setup a ADO org and a project matching the lenght of mine or longer. The first run seems to go fine, but the next ones adds the continuation token and the URL becomes too long
Anything else
I am willing to make a PR with the schema change, would you prefer something like a char[300] or to convert to varchar?
Version
v1.0-beta1@60faf14
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
My suggestion will be to add this to the python/plugins/azuredevops/migrations.py:
@migration(20240424133000, name="Update url column in _tool_azuredevops_builds")
def update_url_column_definition(b: MigrationScriptBuilder):
table = '_tool_azuredevops_builds'
b.execute(f'ALTER TABLE {table} MODIFY COLUMN url VARCHAR(500)', Dialect.MYSQL)
b.execute(f'ALTER TABLE {table} MODIFY COLUMN url VARCHAR(500)', Dialect.POSTGRESQL)
Will try to test locally, and do a PR if it is a success
@bok11 Hello, I found that you plan to submit PR, please tell me when can you submit it. And you can change it to longtext type.
I have the same issue.
Hi, sorry i did not finish this up, i have made the necessary migrations but have some issues getting the gobased migrations to run. It seems they are just skipped, will take another stab at it, as that seems to be the only thing missing
Got the gobased migrations to run, seems i missed to register them in the register.go file.
Will just test with my really long named test org, rebase and prepare the PR. for reference, my test org url is: https://dev.azure.com/bokwill0431xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
This is the max length of an org name, did not find a limit for the repo name length but i guess this length is enough for most people
Fixed by https://github.com/apache/incubator-devlake/pull/7493