jsr icon indicating copy to clipboard operation
jsr copied to clipboard

feat: add minimal implementation of changesStream for registry mirroring

Open elrrrrrrr opened this issue 9 months ago • 2 comments

Hi, jsr team

This PR introduces a minimal implementation of the changesStream interface (#110) to enable registry mirroring capabilities. Given that 11 months have passed since our initial discussion and the feature remains unscheduled, I've taken the liberty to propose this implementation based on our proven approach in npmmirror.

Key implementation notes:

  1. Implements simple polling mechanism via application/json endpoint
  2. Follows npmmirror pattern (compared to npm's heavier stream interface)
  3. Maintains backward compatibility with existing infrastructure

For maintainers:

  • This implementation prioritizes simplicity and maintainability
  • Being new to Rust, I welcome guidance on code quality and best practices 😋

Context: As developers in mainland China experience consistent network challenges accessing the JSR registry directly, this feature would significantly improve developer experience for one of the world's largest developer communities.

Thanks for building this vital infrastructure for the JavaScript ecosystem. Looking forward to your feedback! 🙏🏻

elrrrrrrr avatar Apr 02 '25 08:04 elrrrrrrr

Does the changes endpoint need to return changes in order?

If so, this doesn't work. There is no guaruantee in Postgres that a sequential table is actually sequential on insert because multiple transactions can be in flight at once, and sequential IDs are assigned at the start of a transaction, not once the transaction commits.

I suggest you change this as follows:

  • Instead of a seperate database query to insert changes into the change stream, do this using TRIGGER in postgres on the package_versions table. Add a trigger that on insert creates the changes entry.
  • Add a row_version column to the changes table that is typed as xid8 NOT NULL DEFAULT pg_current_xact_id().
  • On query, limit the returned results to WHERE row_id <= pg_snapshot_xmin(pg_current_snapshot()).

What this will do is ensure that we only show changes where all previous transactions have already committed or failed.

There is no strict time order requirement for npm changes. When the npmmirror system receives a _change record, it will query the upstream registry for the full manifest based on the change.id.

Disorder or duplication within a certain period of time are acceptable.

There are some other changeTypes in npm, such as tag changes, maintainer changes, etc. These changes are not all triggered by db changes.

In npmmirror, we uniformly distribute events, and these behaviors are completely asynchronous. https://github.com/cnpm/cnpmcore/blob/master/app/core/event/ChangesStream.ts#L68

elrrrrrrr avatar Apr 02 '25 15:04 elrrrrrrr

Is there any progress?

fengmk2 avatar May 16 '25 09:05 fengmk2