warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Project.last_serial failed to update.

Open ewdurbin opened this issue 2 years ago • 15 comments

Reported by a bandersnatch user in #12214, the last_serial value for a Project was out of sync with the journals table after what appears to have been an automated mass removal of releases.

warehouse=> select id, name, version, action, submitted_date from journals where name='jarvis-ui' order by id desc limit 10;
    id    |   name    | version |     action     |       submitted_date       
----------+-----------+---------+----------------+----------------------------
 16626043 | jarvis-ui | 0.3.3   | remove release | 2023-01-30 23:30:11.147139
 16626042 | jarvis-ui | 0.3.8   | remove release | 2023-01-30 23:30:10.755916
 16626041 | jarvis-ui | 0.4.0   | remove release | 2023-01-30 23:30:10.732833
 16626040 | jarvis-ui | 0.3.5   | remove release | 2023-01-30 23:30:10.69155
 16626039 | jarvis-ui | 0.3.1   | remove release | 2023-01-30 23:30:10.640855
 16626038 | jarvis-ui | 0.3.2   | remove release | 2023-01-30 23:30:10.541308
 16626037 | jarvis-ui | 0.2.5   | remove release | 2023-01-30 23:30:10.50915
 16626036 | jarvis-ui | 0.3.0   | remove release | 2023-01-30 23:30:10.527536
 16626035 | jarvis-ui | 0.4.8   | remove release | 2023-01-30 23:30:10.537667
 16626034 | jarvis-ui | 0.4.3   | remove release | 2023-01-30 23:30:10.518038
(10 rows)

warehouse=> select last_serial from projects where name='jarvis-ui';
 last_serial 
-------------
    16626041
(1 row)

Note that the last_serial column on the project record is not actually the latest serial. Seems the trigger that calls maintain_project_last_serial() for the journals table failed to update the project. I'll file a separate issue for that.

It has been corrected by calling a no-op update on the row for now.

warehouse=> begin;
BEGIN
warehouse=*> update journals set name=name where id=16626043;
UPDATE 1
warehouse=*> select * from journals where name='jarvis-ui' order by id desc limit 10;
    id    |   name    | version |     action     |       submitted_date       | submitted_by  |             submitted_from              
----------+-----------+---------+----------------+----------------------------+---------------+-----------------------------------------
 16626043 | jarvis-ui | 0.3.3   | remove release | 2023-01-30 23:30:11.147139 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626042 | jarvis-ui | 0.3.8   | remove release | 2023-01-30 23:30:10.755916 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626041 | jarvis-ui | 0.4.0   | remove release | 2023-01-30 23:30:10.732833 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626040 | jarvis-ui | 0.3.5   | remove release | 2023-01-30 23:30:10.69155  | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626039 | jarvis-ui | 0.3.1   | remove release | 2023-01-30 23:30:10.640855 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626038 | jarvis-ui | 0.3.2   | remove release | 2023-01-30 23:30:10.541308 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626037 | jarvis-ui | 0.2.5   | remove release | 2023-01-30 23:30:10.50915  | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626036 | jarvis-ui | 0.3.0   | remove release | 2023-01-30 23:30:10.527536 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626035 | jarvis-ui | 0.4.8   | remove release | 2023-01-30 23:30:10.537667 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
 16626034 | jarvis-ui | 0.4.3   | remove release | 2023-01-30 23:30:10.518038 | thevickypedia | 2600:1702:1dc0:7820:6937:a99d:417e:de82
(10 rows)

warehouse=*> select last_serial from projects where name='jarvis-ui';
 last_serial 
-------------
    16626043
(1 row)

warehouse=*> commit;
COMMIT

Originally posted by @ewdurbin in https://github.com/pypi/warehouse/issues/12214#issuecomment-1416771567

ewdurbin avatar Feb 04 '23 14:02 ewdurbin

I'm trying to think how this could have happened, and I think I have a theory for it.

Multiple transactions occurring in isolation, READ COMMITTED

  1. Transaction A inserts record X into journals
  2. Transaction B inserts record Y into journals
  3. Transaction B updates Project.last_serial, locking it.
  4. Transaction A gets blocked on Transaction B.
  5. Transaction B commits, updating Project.last_serial to Y.
  6. Transaction A updates Project.last_serial, locking it.
  7. Transaction A commits, updating Project.last_serial to X.

If that's right, I suspect we can fix it by always issuing a SELECT FOR UPDATE on the projects table immediately in any view that is going to create journals. That should effectively serialize any transactions that operate on the same project.

dstufft avatar Feb 25 '23 17:02 dstufft

Another instance:

warehouse=> select id, name, version, action, submitted_date from journals where name='expose-localhost' order by id desc limit 10;
    id    |       name       | version |     action     |       submitted_date       
----------+------------------+---------+----------------+----------------------------
 17357317 | expose-localhost | 0.4.2   | remove release | 2023-03-20 00:20:18.457385
 17357316 | expose-localhost | 0.3.0   | remove release | 2023-03-20 00:20:18.45873
 17357315 | expose-localhost | 0.3.1   | remove release | 2023-03-20 00:20:18.461733
 17357314 | expose-localhost | 0.2.9   | remove release | 2023-03-20 00:20:18.431603
 17357313 | expose-localhost | 0.1.8   | remove release | 2023-03-20 00:20:18.423596
 17357312 | expose-localhost | 0.2.4   | remove release | 2023-03-20 00:20:18.367459
 17357311 | expose-localhost | 0.3.4   | remove release | 2023-03-20 00:20:18.360073
 17357310 | expose-localhost | 0.3.6   | remove release | 2023-03-20 00:20:18.373008
 17357309 | expose-localhost | 0.3.5   | remove release | 2023-03-20 00:20:18.271077
 17357308 | expose-localhost | 0.2.0   | remove release | 2023-03-20 00:20:17.959461
(10 rows)

warehouse=> select last_serial from projects where name='expose-localhost';
 last_serial 
-------------
    17357314
(1 row)

Manually resolved.

ewdurbin avatar Apr 06 '23 12:04 ewdurbin

Two packages, cyaxares and later openmldb-tool has been failing for days.

I wonder if it's another instance of this issue, and if there's someone going to fix it manually.

TechCiel avatar May 14 '23 07:05 TechCiel

Two packages, cyaxares and later openmldb-tool has been failing for days.

I wonder if it's another instance of this issue, and if there's someone going to fix it manually.

Could anybody look into this? Maybe @ewdurbin ?

TechCiel avatar May 25 '23 12:05 TechCiel

Another instance:

warehouse=> select id, name, version, action, submitted_date from journals where name='cyaxares' order by id desc limit 20;
    id    |   name   | version |               action                |       submitted_date       
----------+----------+---------+-------------------------------------+----------------------------
 18070374 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.845118
 18070372 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.567462
 18070371 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.386991
 18070370 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.304009
 18070369 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:36.118203
 18070368 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:35.790249
 18070365 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.775461
 18070364 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.586464
 18070363 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.414776
 18070362 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.24229
 18070361 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:27.073759
 18070360 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:26.903427
 18070358 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:26.06091
 18070356 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:24.592312
 18070353 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:15.375401
 18070352 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:15.071967
 18070351 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:13.089239
 18070348 | cyaxares | 0.2     | remove release                      | 2023-05-11 22:13:09.134356
 18070270 | cyaxares | 0.2     | remove file cyaxares-0.2.tar.gz     | 2023-05-11 22:08:59.822525
 18070249 | cyaxares | 0.2     | add source file cyaxares-0.2.tar.gz | 2023-05-11 22:07:59.015508
(20 rows)

warehouse=> select last_serial from projects where name = 'cyaxares';
 last_serial 
-------------
    18070352
(1 row)

Resolved:

warehouse=> begin;
BEGIN
warehouse=*> update journals set name=name where id=18070374;
UPDATE 1
warehouse=*> select id, name, version, action, submitted_date from journals where name='cyaxares' order by id desc limit 10;
    id    |   name   | version |     action     |       submitted_date       
----------+----------+---------+----------------+----------------------------
 18070374 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.845118
 18070372 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.567462
 18070371 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.386991
 18070370 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.304009
 18070369 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:36.118203
 18070368 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:35.790249
 18070365 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:27.775461
 18070364 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:27.586464
 18070363 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:27.414776
 18070362 | cyaxares | 0.2     | remove release | 2023-05-11 22:13:27.24229
(10 rows)

warehouse=*> select last_serial from projects where name = 'cyaxares';
 last_serial 
-------------
    18070374
(1 row)

warehouse=*> commit;
COMMIT

ewdurbin avatar Jun 13 '23 13:06 ewdurbin

Another instance:

warehouse=> select id, name, version, action, submitted_date from journals where name='openmldb-tool' order by id desc limit 10;
    id    |     name      | version |                      action                       |       submitted_date       
----------+---------------+---------+---------------------------------------------------+----------------------------
 18075139 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:13.31279
 18075138 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:13.088052
 18075137 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:12.849749
 18075135 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:09.238179
 18075134 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:09.200528
 18075133 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:08.340646
 18075132 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:07.466855
 18075131 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:05.803824
 18075126 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:09:59.441708
 18072976 | openmldb-tool | 0.8.0   | add py3 file openmldb_tool-0.8.0-py3-none-any.whl | 2023-05-12 03:44:07.769301
(10 rows)

warehouse=> select last_serial from projects where name = 'openmldb-tool';
 last_serial 
-------------
    18075137
(1 row)

Resolved:

warehouse=> begin;
BEGIN
warehouse=*> update journals set name=name where id=18075139;
UPDATE 1
warehouse=*> select last_serial from projects where name = 'openmldb-tool';
 last_serial 
-------------
    18075139
(1 row)

warehouse=*> select id, name, version, action, submitted_date from journals where name='openmldb-tool' order by id desc limit 10;
    id    |     name      | version |                      action                       |       submitted_date       
----------+---------------+---------+---------------------------------------------------+----------------------------
 18075139 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:13.31279
 18075138 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:13.088052
 18075137 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:12.849749
 18075135 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:09.238179
 18075134 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:09.200528
 18075133 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:08.340646
 18075132 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:07.466855
 18075131 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:10:05.803824
 18075126 | openmldb-tool | 0.8.0   | remove release                                    | 2023-05-12 08:09:59.441708
 18072976 | openmldb-tool | 0.8.0   | add py3 file openmldb_tool-0.8.0-py3-none-any.whl | 2023-05-12 03:44:07.769301
(10 rows)

warehouse=*> commit;
COMMIT

ewdurbin avatar Jun 13 '23 13:06 ewdurbin

I think that https://github.com/pypi/warehouse/pull/13936 might fix this.

dstufft avatar Jun 14 '23 02:06 dstufft

Maybe another instance on fiftyone-desktop. Waiting for manual intervention.

2023-07-05 23:53:10,292 ERROR: Stale serial for fiftyone-desktop (18769749) not updating. Giving up. (package.py:82)
2023-07-05 23:53:10,292 ERROR: Error syncing package: fiftyone-desktop@18769749 (mirror.py:377)
bandersnatch.master.StalePage: Expected PyPI serial 18769749 for request https://pypi.org/pypi/fiftyone-desktop/json but got 18769748.

TechCiel avatar Jul 06 '23 00:07 TechCiel

@TechCiel Manually resolved that for you. In this instance we had two journals for the same project get issued within a ~2 seconds of each other:

    id    |       name       | version |                                 action                                  |       submitted_date
----------+------------------+---------+-------------------------------------------------------------------------+----------------------------
 18769749 | fiftyone-desktop | 0.28.1  | add py3 file fiftyone_desktop-0.28.1-py3-none-macosx_11_0_arm64.whl     | 2023-07-05 15:44:55.417118
 18769748 | fiftyone-desktop | 0.28.1  | remove file fiftyone_desktop-0.28.1-py3-none-macosx_11_10_arm64.whl     | 2023-07-05 15:44:57.439115 

di avatar Jul 06 '23 01:07 di

Manually resolved that for you.

Thank you @di , really appreciate that quick respond, but the problem is not resolved. Maybe manual update didn't purge the CDN cache?

> curl 'https://pypi.org/pypi/fiftyone-desktop/json' -I | grep x-pypi-last-serial
x-pypi-last-serial: 18769748

should be 18769749.

TechCiel avatar Jul 06 '23 13:07 TechCiel

@TechCiel Good catch. I've issued a purge and this should now be resolved.

di avatar Jul 06 '23 14:07 di

Another package mpf is failing for 40+ hours, and I see no sign of self-recovery

2023-11-05 04:29:24,004 ERROR: Stale serial for package mpf - Attempt 3 (package.py:76)
2023-11-05 04:29:24,005 ERROR: Stale serial for mpf (20485085) not updating. Giving up. (package.py:82)
> curl 'https://pypi.org/pypi/mpf/json' -I | grep x-pypi-last-serial
x-pypi-last-serial: 20485084

TechCiel avatar Nov 06 '23 13:11 TechCiel

Another package mpf is failing for 40+ hours, and I see no sign of self-recovery

Still.

TechCiel avatar Nov 15 '23 13:11 TechCiel

The mpf issue has been unintentionally resolved with an extra update of package.

Sadly, we now have another issue:

2024-02-05 20:19:26,441 ERROR: Stale serial for proteus-cli (21741396) not updating. Giving up. (package.py:82)

TechCiel avatar Feb 06 '24 16:02 TechCiel

2024-04-16 10:36:39,262 ERROR: Stale serial for tccli (22781647) not updating. Giving up. (package.py:82)

New failure with the proteus-cli one still not resolved.

TechCiel avatar Apr 17 '24 02:04 TechCiel

Is the way to go about this to always just manually come here and request a purge? There is nothing on our client end that can assist with these issues?

bandersnatch.master.StalePage: Expected PyPI serial 14647759 for request https://pypi.org/pypi/pytranslation/json but got 13365137. We can no longer issue a PURGE. Report issue to PyPA Warehouse GitHub if it persists ...

epifeny avatar May 19 '24 08:05 epifeny

With https://github.com/pypi/warehouse/pull/13936 merged now (along with a few follow up PRs to fix some deadlocks), I think that the primary cause of this has been fixed now.

The tl;dr is that our mirroring relied on the serial to be a monotonically increasing integer, but due to the way PostgreSQL works, concurrent transactions could end up with serials being "out of order", and https://github.com/pypi/warehouse/pull/13936 changes that so that transactions that generate new serial numbers are serialized behind what is effectively a mutex.

I'm going to close this now, but if anyone sees any new reports of this happening after today, we can re-open this issue.

dstufft avatar May 31 '24 16:05 dstufft

2024-09-24 10:36:00,003 ERROR: Error syncing package: github@25164033
 (mirror.py:377)
bandersnatch.master.StalePage: Expected PyPI serial 25164033 for requ
est https://pypi.org/pypi/github/json but got 25138863.

Likely but not sure, another instance of this problem, still ongoing. Could someone look into the database?

cc @dstufft

TechCiel avatar Sep 26 '24 23:09 TechCiel

That's a bug for sure caused by a recent merge that causes journal to proceed but not purge for a project.

ewdurbin avatar Sep 26 '24 23:09 ewdurbin

Could you help with purging the package while fixing the cause?

TechCiel avatar Sep 27 '24 02:09 TechCiel

Purges are being issued shortly.

ewdurbin avatar Sep 27 '24 12:09 ewdurbin

Should be cleared now. Thanks for the vigilance on this issue @TechCiel, let us know if any more crop up.

ewdurbin avatar Sep 27 '24 12:09 ewdurbin

Should be cleared now.

It now shows x-pypi-last-serial: 14353030 ? 🤔 Should it be monotically increasing?

Thanks for the vigilance

That's because I maintain a PyPI mirror, and bandersnatch will fail the whole process in such cases.

TechCiel avatar Sep 27 '24 14:09 TechCiel

Should be cleared now.

It now shows x-pypi-last-serial: 14353030 ? 🤔 Should it be monotically increasing?

14353030 is the last valid journal entry for the github project. The bug that was introduced associated an unrelated change with that project.

ewdurbin avatar Sep 27 '24 14:09 ewdurbin

14353030 is the last valid journal entry for the github project. The bug that was introduced associated an unrelated change with that project.

I think increasing serial is kind of API promise of pypi.org, and bandersnatch is still expecting for 25164033. The XMLRPC changelog API could still be returning this serial.

TechCiel avatar Sep 27 '24 14:09 TechCiel

bandersnatch.master.StalePage: Expected PyPI serial 25483353 for requ
est https://pypi.org/pypi/github/json but got 14353030.

The github package got bumped to a new serial and is failing again.

TechCiel avatar Oct 18 '24 12:10 TechCiel