feat: add minimal implementation of changesStream for registry mirroring
Hi, jsr team
This PR introduces a minimal implementation of the changesStream interface (#110) to enable registry mirroring capabilities. Given that 11 months have passed since our initial discussion and the feature remains unscheduled, I've taken the liberty to propose this implementation based on our proven approach in npmmirror.
Key implementation notes:
- Implements simple polling mechanism via
application/jsonendpoint - Follows npmmirror pattern (compared to npm's heavier stream interface)
- Maintains backward compatibility with existing infrastructure
For maintainers:
- This implementation prioritizes simplicity and maintainability
- Being new to Rust, I welcome guidance on code quality and best practices 😋
Context: As developers in mainland China experience consistent network challenges accessing the JSR registry directly, this feature would significantly improve developer experience for one of the world's largest developer communities.
Thanks for building this vital infrastructure for the JavaScript ecosystem. Looking forward to your feedback! 🙏🏻
Does the changes endpoint need to return changes in order?
If so, this doesn't work. There is no guaruantee in Postgres that a sequential table is actually sequential on insert because multiple transactions can be in flight at once, and sequential IDs are assigned at the start of a transaction, not once the transaction commits.
I suggest you change this as follows:
- Instead of a seperate database query to insert changes into the change stream, do this using
TRIGGERin postgres on thepackage_versionstable. Add a trigger that on insert creates thechangesentry.- Add a
row_versioncolumn to thechangestable that is typed asxid8 NOT NULL DEFAULT pg_current_xact_id().- On query, limit the returned results to
WHERE row_id <= pg_snapshot_xmin(pg_current_snapshot()).What this will do is ensure that we only show changes where all previous transactions have already committed or failed.
There is no strict time order requirement for npm changes. When the npmmirror system receives a _change record, it will query the upstream registry for the full manifest based on the change.id.
Disorder or duplication within a certain period of time are acceptable.
There are some other changeTypes in npm, such as tag changes, maintainer changes, etc. These changes are not all triggered by db changes.
In npmmirror, we uniformly distribute events, and these behaviors are completely asynchronous. https://github.com/cnpm/cnpmcore/blob/master/app/core/event/ChangesStream.ts#L68
Is there any progress?