anitya icon indicating copy to clipboard operation
anitya copied to clipboard

Batch feeding package entries for openSUSE

Open yankov-pt opened this issue 7 months ago • 7 comments

Hello, this is more of a information request, I couldn't find an email in the webpage.

We're preparing to create entries for all projects in openSUSE Tumbleweed. Most things are already tracked in releasemonitoring, so we're looking at about 5000-5700 new projects. And a few thousand distro mappings for existing entries. We're wondering if there's an easier way to batch add data? We'd like to be respectful and not spam your API endpoints endlessly. Maybe through a csv dump you could upload server side? Maybe you have some other ideas? Anyway, thanks for creating and running this awesome service!

yankov-pt avatar May 09 '25 06:05 yankov-pt

Hi, for users there isn't any easier way then using API endpoints (you can add some wait time between calls so it's not hammering Anitya that much). But if you can provide SQL script I should be able to run it directly above database.

Zlopez avatar May 12 '25 13:05 Zlopez

Awesome! We will prepare an SQL script in that case. With fields, just like the API ? example:

{
    "backend": "custom",
    "homepage": "https://example.com/test",
    "name": "test_project",
    "version_prefix": "release-"
}

By the way, this is probably for another issue, but testing today with single items, POST /api/v2/projects/ returns 500: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

yankov-pt avatar May 12 '25 15:05 yankov-pt

In case of the SQL script it would be best to check the database schema. You will see what you need to insert and where.

What request did you sent in case of error 500?

Zlopez avatar May 13 '25 09:05 Zlopez

Hello. Sorry for the delay. I am uploading an SQL script here, containing ~3300 projects and ~3300 package entries with openSUSE distro. I have verified, by importing the data into a postgres instance used by running anitya locally. In the meantime, we have added a delay and are uploading some more distribution name mappings for things already tracked on anitya, with a sleep every few API calls to not overwhelm.

openSUSE.sql.tar.gz

yankov-pt avatar May 16 '25 11:05 yankov-pt

Thanks for the SQL script. I will try to run it as soon as possible.

Zlopez avatar May 16 '25 11:05 Zlopez

Hello, have you had the time to try and run this?

yankov-pt avatar Jun 24 '25 09:06 yankov-pt

I spent two weeks on conferences and now we are in middle of datacenter move. I still have it on my TODO list, but the priority is not high for doing it. I hope I will get to that when the datacenter move is done.

Zlopez avatar Jun 24 '25 10:06 Zlopez

I'm seeing the 500 issue as well and have opened #1911 with more details.

W.r.t. you can add some wait time between calls what kind of delay do you suggest? I've been bulk adding entries at a write rate of up to 1 every 2 seconds; is that sufficient to avoid issues on your end?

dfandrich avatar Jul 09 '25 18:07 dfandrich

Found the error you are mentioning

[Thu Jul 10 04:04:44.250392 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456] [2025-07-10 04:04:44,248 anitya.app ERROR 139996270950080] Exception on /api/v2/packages/ [POST]
[Thu Jul 10 04:04:44.250441 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456] Traceback (most recent call last):
[Thu Jul 10 04:04:44.250446 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib/python3.13/site-packages/flask/app.py", line 1511, in wsgi_app
[Thu Jul 10 04:04:44.250451 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     response = self.full_dispatch_request()
[Thu Jul 10 04:04:44.250454 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib/python3.13/site-packages/flask/app.py", line 919, in full_dispatch_request
[Thu Jul 10 04:04:44.250457 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     rv = self.handle_user_exception(e)
[Thu Jul 10 04:04:44.250460 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib/python3.13/site-packages/flask/app.py", line 917, in full_dispatch_request
[Thu Jul 10 04:04:44.250472 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     rv = self.dispatch_request()
[Thu Jul 10 04:04:44.250475 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib/python3.13/site-packages/flask/app.py", line 902, in dispatch_request
[Thu Jul 10 04:04:44.250478 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
[Thu Jul 10 04:04:44.250481 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
[Thu Jul 10 04:04:44.250484 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib/python3.13/site-packages/flask/views.py", line 110, in view
[Thu Jul 10 04:04:44.250486 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]
[Thu Jul 10 04:04:44.250489 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
[Thu Jul 10 04:04:44.250492 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib/python3.13/site-packages/flask/views.py", line 191, in dispatch_request
[Thu Jul 10 04:04:44.250495 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     return current_app.ensure_sync(meth)(**kwargs)  # type: ignore[no-any-return]
[Thu Jul 10 04:04:44.250497 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
[Thu Jul 10 04:04:44.250500 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/local/lib/python3.13/site-packages/anitya/authentication.py", line 124, in _authenticated_api_access
[Thu Jul 10 04:04:44.250503 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     return f(*args, **kwds)
[Thu Jul 10 04:04:44.250506 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/local/lib/python3.13/site-packages/anitya/api_v2.py", line 251, in post
[Thu Jul 10 04:04:44.250508 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     ).one()
[Thu Jul 10 04:04:44.250511 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]       ~~~^^
[Thu Jul 10 04:04:44.250514 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib64/python3.13/site-packages/sqlalchemy/orm/query.py", line 2808, in one
[Thu Jul 10 04:04:44.250517 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     return self._iter().one()  # type: ignore
[Thu Jul 10 04:04:44.250519 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]            ~~~~~~~~~~~~~~~~^^
[Thu Jul 10 04:04:44.250522 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib64/python3.13/site-packages/sqlalchemy/engine/result.py", line 1815, in one
[Thu Jul 10 04:04:44.250525 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     return self._only_one_row(
[Thu Jul 10 04:04:44.250527 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]            ~~~~~~~~~~~~~~~~~~^
[Thu Jul 10 04:04:44.250530 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]         raise_for_second_row=True, raise_for_none=True, scalar=False
[Thu Jul 10 04:04:44.250533 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Thu Jul 10 04:04:44.250535 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     )
[Thu Jul 10 04:04:44.250538 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     ^
[Thu Jul 10 04:04:44.250541 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]   File "/usr/lib64/python3.13/site-packages/sqlalchemy/engine/result.py", line 813, in _only_one_row
[Thu Jul 10 04:04:44.250544 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     raise exc.MultipleResultsFound(
[Thu Jul 10 04:04:44.250549 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     ...<4 lines>...
[Thu Jul 10 04:04:44.250553 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456]     )
[Thu Jul 10 04:04:44.250556 2025] [wsgi:error] [pid 8565:tid 8571] [client 10.128.2.2:41456] sqlalchemy.exc.MultipleResultsFound: Multiple rows were found when exactly one was required

Will need to investigate what is causing it and fix that, but it seems it's issue in some data.

Zlopez avatar Jul 10 '25 04:07 Zlopez

I have noticed that at least in a couple of instances I've received 500 when there are two identical projects that differ only the case of the project name.

dfandrich avatar Jul 10 '25 17:07 dfandrich

This failing query is checking if they are two identical projects based on ecosystem and project name. I checked the DB and didn't found any duplicates, so I'm not sure where this error is coming from.

I probably know how to fix it, but it's not a nice fix. Instead of checking that we really get only one entry, just take the first you found. Anyway I should clear the database of any duplicates (for example same project hosted on pypi and github will end in the database as two different projects and we should prevent that).

Zlopez avatar Jul 11 '25 10:07 Zlopez

I've opened #1912 specifically on the duplicate entries issue.

Here are the project IDs that resulted in 500 errors for me when adding packages. Most of these I fixed manually, so I don't know if they'll still show the problem:

27452 258602 20024 84908

dfandrich avatar Jul 18 '25 20:07 dfandrich

This PR should fix the 500. @yankov-pt You should be able to do the batch adding of packages once this fix will be released without encountering Internal server error.

Zlopez avatar Jul 29 '25 14:07 Zlopez

The version 2.0.2 is now deployed. @yankov-pt you should be now able to add the packages by yourself as I can't find a time to do it :/

Zlopez avatar Jul 30 '25 09:07 Zlopez

@Zlopez Thank you very much! I will add them carefully over the API!

yankov-pt avatar Aug 04 '25 08:08 yankov-pt

@yankov-pt I assume this is one of the packages you added. The project is missing version_url, so there is no way for Anitya to check for new versions.

Zlopez avatar Aug 13 '25 10:08 Zlopez

@Zlopez This returns a 404, I'm assuming it was dropped since it was missing a version_url?

yankov-pt avatar Oct 13 '25 09:10 yankov-pt

@yankov-pt It was probably removed by automation, as it removes any project that will fail 1000 consecutive checks.

Zlopez avatar Oct 13 '25 14:10 Zlopez

Thanks for the info. I believe all our other uploads should be okay, as they pick up version url from the backends. Thank you for all your help, I will be closing this now.

yankov-pt avatar Oct 14 '25 09:10 yankov-pt