incubator-devlake icon indicating copy to clipboard operation
incubator-devlake copied to clipboard

[Bug][SQL] URL column too short

Open bok11 opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

My pipeline fails to get the latest pipeline run, as it cannot insert the URL into the database. Here is my URL, with some info replaced with X's but the length is still representative: https://dev.azure.com/ccccxxxxxxxxxxx/xxxxxx/_apis/build/builds?repositoryId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&repositoryType=tfsgit&deletedFilter=excludeDeleted&queryOrder=finishTimeDescending&api-version=7.0&continuationToken=2024-02-06T10%3A35%3A26.7061112Z

As you can see the URL is 264 chars long, while the database schema is 255 chars, making devlake throw an exception:

level=error msg=" [pipeline service] [pipeline #90] [task #1212] subtask collectAzuredevopsBuilds ended unexpectedly
  Wraps: 
    (2) remote error response:
      | ERROR: DataError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
      | (MySQLdb.DataError) (1406, \"Data too long for column 'url' at row 1\")
      | [SQL: INSERT INTO _raw_azuredevops_builds (params, data, url, input, created_at) VALUES (%!s(MISSING), %!s(MISSING), %!s(MISSING), %!s(MISSING), %!s(MISSING))]
      | [parameters: ('{\"ConnectionId\":1,\"ScopeId\":\"d675bd99-5d22-4a11-a3dc-34fd9b9a4217\"}', b'{\"_links\": {\"self\": {\"href\": \"https://dev.azure.com/ccccxxxxxxxxxxx/d793494b-e38e-4d53-b9ed-205e83975e88/_apis/build/Builds/14146\"}, \"web\": {\"href\" ... (5391 characters truncated) ... mchart-cfg\", \"clean\": null, \"checkoutSubmodules\": false}, \"retainedByRelease\": false, \"triggeredByBuild\": null, \"appendCommitMessageToRunName\": true}', 'https://dev.azure.com/ccccxxxxxxxxxxx/xxxxxx/_apis/build/builds?repositoryId=d675bd99-5d22-4a11-a3dc-34fd9b9a4217&repositoryType=tfsgit&deletedFilter=excludeDeleted&queryOrder=finishTimeDescending&api-version=7.0&continuationToken=2024-02-08T13%!A(MISSING)15%!A(MISSING)08.7782905Z', b'{\"OrgId\": \"ccccxxxxxxxxxxx\", \"ProjectId\": \"xxxxx\", \"RepoId\": \"d675bd99-5d22-4a11-a3dc-34fd9b9a4217\", \"Provider\": \"tfsgit\"}', datetime.datetime(2024, 4, 24, 10, 8, 21, 412891))]
      | (Background on this error at: https://sqlalche.me/e/14/9h9h)
      | Traceback (most recent call last):
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py\", line 1900, in _execute_context
      |     self.dialect.do_execute(
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py\", line 736, in do_execute
      |     cursor.execute(statement, parameters)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/MySQLdb/cursors.py\", line 179, in execute
      |     res = self._query(mogrified_query)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/MySQLdb/cursors.py\", line 330, in _query
      |     db.query(q)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/MySQLdb/connections.py\", line 261, in query
      |     _mysql.connection.query(self, query)
      | MySQLdb.DataError: (1406, \"Data too long for column 'url' at row 1\")
      |
      | The above exception was the direct cause of the following exception:
      |
      | Traceback (most recent call last):
      |   File \"/app/python/plugins/azuredevops/azuredevops/main.py\", line 157, in <module>
      |     AzureDevOpsPlugin.start()
      |   File \"/app/python/pydevlake/pydevlake/plugin.py\", line 253, in start
      |     fire.Fire(PluginCommands(plugin))
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/fire/core.py\", line 141, in Fire
      |     component_trace = _Fire(component, args, parsed_flag_args, context, name)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/fire/core.py\", line 466, in _Fire
      |     component, remaining_args = _CallAndUpdateTrace(
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/fire/core.py\", line 681, in _CallAndUpdateTrace
      |     component = fn(*varargs, **kwargs)
      |   File \"/app/python/pydevlake/pydevlake/ipc.py\", line 58, in wrapper
      |     for each in ret:
      |   File \"/app/python/pydevlake/pydevlake/ipc.py\", line 73, in collect
      |     yield from self._plugin.collect(self._mk_context(ctx), stream)
      |   File \"/app/python/pydevlake/pydevlake/plugin.py\", line 111, in _run_stream
      |     yield from getattr(stream, subtask).run(ctx)
      |   File \"/app/python/pydevlake/pydevlake/subtasks.py\", line 81, in run
      |     raise e
      |   File \"/app/python/pydevlake/pydevlake/subtasks.py\", line 65, in run
      |     session.merge(subtask_run)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 2987, in merge
      |     self._autoflush()
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 2257, in _autoflush
      |     util.raise_(e, with_traceback=sys.exc_info()[2])
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/util/compat.py\", line 208, in raise_
      |     raise exception
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 2246, in _autoflush
      |     self.flush()
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 3386, in flush
      |     self._flush(objects)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/orm/session.py\", line 3526, in _flush
      |     transaction.rollback(_capture_exception=True)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py\", line 70, in __exit__
      |     compat.raise_(
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/util/compat.py\", line 208, in raise_
      |     raise exception
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py\", line 1900, in _execute_context
      |     self.dialect.do_execute(
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py\", line 736, in do_execute
      |     cursor.execute(statement, parameters)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib python3.9/site-packages/MySQLdb/cursors.py\", line 179, in execute
      |     res = self._query(mogrified_query)
      |   File \"/app/.cache/pypoetry/virtualenvs/azuredevops-ANnMAkq9-py3.9/lib python3.9/site-packages/MySQLdb/cursors.py\", line 330, in _query
      |     db.query(q)
      | sqlalchemy.exc.DataError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
      | (MySQLdb.DataError) (1406, \"Data too long for column 'url' at row 1\")
      | [SQL: INSERT INTO _raw_azuredevops_builds (params, data, url, input, created_at) VALUES (%!s(MISSING), %!s(MISSING), %!s(MISSING), %!s(MISSING), %!s(MISSING))]
      | [parameters: ('{\"ConnectionId\":1,\"ScopeId\":\"d675bd99-5d22-4a11-a3dc-34fd9b9a4217\"}', b'{\"_links\": {\"self\": {\"href\": \"https://dev.azure.com/ccccxxxxxxxxxxx/xxxxxx/_apis/build/Builds/14146\"}, \"web\": {\"href\" ... (5391 characters truncated) ... mchart-cfg\", \"clean\": null, \"checkoutSubmodules\": false}, \"retainedByRelease\": false, \"triggeredByBuild\": null, \"appendCommitMessageToRunName\": true}', 'https://dev.azure.com/ccccxxxxxxxxxxx/xxxxxx/_apis/build/builds?repositoryId=d675bd99-5d22-4a11-a3dc-34fd9b9a4217&repositoryType=tfsgit&deletedFilter=excludeDeleted&queryOrder=finishTimeDescending&api-version=7.0&continuationToken=2024-02-08T13%!A(MISSING)15%!A(MISSING)08.7782905Z', b'{\"OrgId\": \"ccccxxxxxxxxxxx\", \"ProjectId\": \"xxxxx\", \"RepoId\": \"d675bd99-5d22-4a11-a3dc-34fd9b9a4217\", \"Provider\": \"tfsgit\"}', datetime.datetime(2024, 4, 24, 10, 8, 21, 412891))]
    Wraps:
      (3) exit status 1
      (4) exit status 1
    Error types: 
      (1) *hintdetail.withDetail 
      (2) *hintdetail.withDetail 
      (3) *hintdetail.withDetail 
      (4) *exec.ExitError"

What do you expect to happen

I would expect either a much longer char column or a varchar

How to reproduce

Setup a ADO org and a project matching the lenght of mine or longer. The first run seems to go fine, but the next ones adds the continuation token and the URL becomes too long

Anything else

I am willing to make a PR with the schema change, would you prefer something like a char[300] or to convert to varchar?

Version

v1.0-beta1@60faf14

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

bok11 avatar Apr 24 '24 10:04 bok11

My suggestion will be to add this to the python/plugins/azuredevops/migrations.py:

@migration(20240424133000, name="Update url column in _tool_azuredevops_builds")
def update_url_column_definition(b: MigrationScriptBuilder):
    table = '_tool_azuredevops_builds'
    b.execute(f'ALTER TABLE {table} MODIFY COLUMN url VARCHAR(500)', Dialect.MYSQL)
    b.execute(f'ALTER TABLE {table} MODIFY COLUMN url VARCHAR(500)', Dialect.POSTGRESQL)

Will try to test locally, and do a PR if it is a success

bok11 avatar Apr 24 '24 11:04 bok11

@bok11 Hello, I found that you plan to submit PR, please tell me when can you submit it. And you can change it to longtext type.

abeizn avatar May 09 '24 03:05 abeizn

I have the same issue.

klemen-df avatar May 21 '24 08:05 klemen-df

Hi, sorry i did not finish this up, i have made the necessary migrations but have some issues getting the gobased migrations to run. It seems they are just skipped, will take another stab at it, as that seems to be the only thing missing

bok11 avatar May 21 '24 09:05 bok11

Got the gobased migrations to run, seems i missed to register them in the register.go file.

Will just test with my really long named test org, rebase and prepare the PR. for reference, my test org url is: https://dev.azure.com/bokwill0431xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

This is the max length of an org name, did not find a limit for the repo name length but i guess this length is enough for most people

bok11 avatar May 21 '24 10:05 bok11

Fixed by https://github.com/apache/incubator-devlake/pull/7493

bok11 avatar May 29 '24 10:05 bok11