RGW: Migrate topics to data path v2
RGW: Migrate topics to data path v2
Contribution Guidelines
-
To sign and title your commits, please refer to Submitting Patches to Ceph.
-
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
-
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.
Checklist
- Tracker (select at least one)
- [ ] References tracker ticket
- [ ] Very recent bug; references commit where it was introduced
- [x] New feature (ticket optional)
- [ ] Doc update (no ticket needed)
- [ ] Code cleanup (no ticket needed)
- Component impact
- [ ] Affects Dashboard, opened tracker ticket
- [ ] Affects Orchestrator, opened tracker ticket
- [x] No impact that needs to be tracked
- Documentation (select at least one)
- [ ] Updates relevant documentation
- [x] No doc update is appropriate
- Tests (select at least one)
- [ ] Includes unit test(s)
- [ ] Includes integration test(s)
- [ ] Includes bug reproducer
- [x] No tests
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
pleass see the ongoing design discussion here: https://github.com/ceph/ceph/pull/54868#issuecomment-1910358687
early feedback: rados is the only backend that has topic metadata, so i think we should make this migration logic rados-specific
that means we don't need to add new interfaces to rgw_sal.h, rgw_sal_filter.h, etc. and the migration code itself would be under driver/rados/ instead of class RGWPubSub
p.s. there are additional commits in https://github.com/ceph/ceph/pull/55152, we should base the migration work on that
following our discussion yesterday, i started looking into two aspects of this - cancellation and locking
cancellation
if the migration logic is still running when RadosStore::finalize() is called to shut down, we need a way to cancel it. asio has a per-op cancellation feature that would be perfect for this, but it turns out that the spawn/yield_context coroutines don't support this
i think our best option, then, is to spawn the migration coroutine on a separate io_context with its own thread. that way, RadosStore::finalize() can call io_context.stop() and thread.join(). i can add an example commit to hook that up
locking
the idea was to take a cls_lock on the v1 topics metadata object to cover the migration, but i didn't fully think through the dimensioning by tenant
we could potentially use a lock on pubsub.{tenant} to cover that topic's migration, along with any buckets under pubsub.{tenant}.bucket.*. but there could be buckets in other tenants that don't have a corresponding pubsub.{tenant} topics object at all, and we need to migrate those too
maybe it's better just to do the migration with no locking, and let the radosgws race. this is fine as long as we a) use exclusive creates to write the v2 topic metadata, and use RGWObjVersionTracker correctly when writing RGW_ATTR_BUCKET_NOTIFICATION to buckets
@yuvalif this same issue comes up with our definition of the 'migrating state' from https://github.com/ceph/ceph/pull/54868#issuecomment-1910358687:
"migrating" state will be derived implicitly from enabling of the topics v2 feature together with the existence of the v1 topics system object
but it's probably okay to block rgw_rest_pubsub requests on a per-tenant basis
i think our best option, then, is to spawn the migration coroutine on a separate io_context with its own thread. that way,
RadosStore::finalize()can callio_context.stop()andthread.join(). i can add an example commit to hook that up
i pushed a draft commit to https://github.com/ceph/ceph/pull/55504 that does this. i added the hooks to RGWRados instead of RadosStore so the logic would be right next to rgw::notify::init()/shutdown() and reuse the run_notification_thread flag so we can avoid starting migration in radosgw-admin commands
it adds a new driver/rados/topic_migration.cc file - lets put all of the migration logic there instead of adding more to RadosStore. there are also example functions migrate_topics() and migrate_notifications() with notes about the error handling
@AliMasarweh if that looks ok to you, do you want to pull that commit into your pr branch?
i think our best option, then, is to spawn the migration coroutine on a separate io_context with its own thread. that way,
RadosStore::finalize()can callio_context.stop()andthread.join(). i can add an example commit to hook that upi pushed a draft commit to #55504 that does this. i added the hooks to RGWRados instead of RadosStore so the logic would be right next to
rgw::notify::init()/shutdown()and reuse therun_notification_threadflag so we can avoid starting migration in radosgw-admin commandsit adds a new
driver/rados/topic_migration.ccfile - lets put all of the migration logic there instead of adding more toRadosStore. there are also example functions migrate_topics() and migrate_notifications() with notes about the error handling@AliMasarweh if that looks ok to you, do you want to pull that commit into your pr branch?
Yes it looks ok, I have pulled that commit and completed the missing segments of code I think it is ready for review
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
PR passed manual testing
what are the next steps for test coverage? it seems that we need some script that creates lots of v1 metadata across several tenants (including empty tenant), then enables v2 and verifies that we end up with valid v2 metadata and that all v1 metadata objects get deleted
what are the next steps for test coverage? it seems that we need some script that creates lots of v1 metadata across several tenants (including empty tenant), then enables v2 and verifies that we end up with valid v2 metadata and that all v1 metadata objects get deleted
Sure, where should I write the test, we usually write tests in test_bn.py, but I don't think this runs a multi-site cluster
what are the next steps for test coverage? it seems that we need some script that creates lots of v1 metadata across several tenants (including empty tenant), then enables v2 and verifies that we end up with valid v2 metadata and that all v1 metadata objects get deleted
Sure, where should I write the test, we usually write tests in
test_bn.py, but I don't think this runs a multi-site cluster
we don't need a multisite cluster, we need a single site but with a realm defined. with this commit: https://github.com/ceph/ceph/commit/f01871015df4fa2b8340430b78551e07a83ad6dd the test_bn.py script can run againts a single zone in a multisite setup (or a single site setup with a realm). this means that you can write a python test similar to the instruction here: https://gist.github.com/yuvalif/21449e301732b719cd1ed97c3eeeabb2 to test the migration. doing that locally should be relatively straight forward, however, running that in teuthology would require a different setup that we currently use in the "rgw:notifications" suite. so, i would recommend, marking this initially as a manual test, so it won't run in teuthology. and adding it to teuthology in a separate effort.
couple of notes regarding the testing:
- to cover possible race condition between RGWs (in the same zone) we should test with env var:
RGW_PER_ZONE=2 - please make sure that there are 2 tenants in th test
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
jenkins test make check
jenkins test make check
jenkins test make check
jenkins test make check
i would like to see more detailed test coverage of the metadata migration itself. it doesn't need to start any notification endpoints or generate any notifications. it just needs to create lots of v1 metadata and verify that we end up with the right v2 metadata
i think the easiest way to test that is to expose a radosgw-admin command that calls rgwrados::topic_migration::migrate() directly. if you stop all radosgws before enabling notification_v2, we can ensure that this radosgw-admin command handles all of the migration. the benefit of using a command for testing is that you can be sure that all migration has completed once the command returns. if we rely on radosgws to run this migration code in the background, we don't know precisely when it completes - we'd have to keep polling and sleeping until the metadata is gone, making the test slower and less reliable
requested test coverage:
- create enough metadata objects to exceed the pagination limit. that's currently hard-coded as
constexpr uint32_t max = 100, but the radosgw-admin command could take a custom--max-entriesfor that and pass a smaller value - create topic and notification metadata under several different tenant names, including the empty tenant
- after completion, verify with
rados lsthat all v1 metadata objects are removed - verify that v2 metadata can still be read via apis like
ListTopicsandGetBucketNotificationConfiguration(restart radosgws once the radosgw-admin command completes) - test migration behavior when v2 metadata already exists. consider a radosgw-admin option like
--skip-deletesthat skips the deletion of v1 metadata. that way we can run the command once with that flag, then again without to verify that it doesn't cause migration to fail
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved
i think the easiest way to test that is to expose a radosgw-admin command that calls
rgwrados::topic_migration::migrate()directly.
the command will be for test purposes only?
if you stop all radosgws before enabling notification_v2, we can ensure that this radosgw-admin command handles all of the migration. the benefit of using a command for testing is that you can be sure that all migration has completed once the command returns.
the problem is that, AFAIK, we don't have infra in teuthology to start and stop RGWs. this means that this will be a local test only. we can probably add that, but this will more development in teuthology (which we want to minimize).
if we rely on radosgws to run this migration code in the background, we don't know precisely when it completes - we'd have to keep polling and sleeping until the metadata is gone, making the test slower and less reliable
currently we are just doing "sleep" which creates problems in both directions (flaky tests and/or long test), we can use radosgw-admin to poll for the topic list, it should fail as long as we are migrating. don't see that as a big issue
requested test coverage:
- create enough metadata objects to exceed the pagination limit. that's currently hard-coded as
constexpr uint32_t max = 100, but the radosgw-admin command could take a custom--max-entriesfor that and pass a smaller value- create topic and notification metadata under several different tenant names, including the empty tenant
- after completion, verify with
rados lsthat all v1 metadata objects are removed- verify that v2 metadata can still be read via apis like
ListTopicsandGetBucketNotificationConfiguration(restart radosgws once the radosgw-admin command completes)- test migration behavior when v2 metadata already exists. consider a radosgw-admin option like
--skip-deletesthat skips the deletion of v1 metadata. that way we can run the command once with that flag, then again without to verify that it doesn't cause migration to fail
sounds good
the problem is that, AFAIK, we don't have infra in teuthology to start and stop RGWs. this means that this will be a local test only. we can probably add that, but this will more development in teuthology (which we want to minimize).
there are a couple ways to do that in teuthology, but you're right that it complicates matters. maybe we can avoid that altogether by preventing the rgws from doing the migration themselves
currently, RGWRados::init_complete() only calls into rgwrados::topic_migration::migrate() when run_notification_thread is true, so that radosgw-admin commands don't start the background migration process. rgw_admin.cc always passes false, and radosgw (via rgw::AppMain::init_storage()) always passes true
if we add a config option rgw_run_notification_thread similar to rgw_run_sync_thread to control that, we could turn that off for the teuthology job that runs this migration test. then we could be sure that our new radosgw-admin command is the only one running the migration code
obviously we'd need to separate those migration test cases from the rgw/notifications jobs which do depend on that notification thread
- after completion, verify with
rados lsthat all v1 metadata objects are removed
using rados ls might be tricky if you don't know exactly what pool name to expect. as an alternative, the proposed radosgw-admin command could return a json array of all objects visited. after running the migration once, we could run the command again to verify that it returns an empty array
@cbodley, @AliMasarweh:
we will add the folowing test cases as part of test_bn.py:
larger scale migration
- create 2 additional users that belong to 2 named tenants
- disable v2
- create 100 topics+buckets+notifications for each of the above users and for the default user (with default tenant)
- enable v2
- poll
topic listcomamnd that should fail untill migration is complete - once the command is successfull, verify that we can get the 300 topics and notifications
migration with v2 configuration
- create 2 additional users that belong to 2 named tenants
- create topic+bucket+notification for each of the above users and for the default user (with default tenant)
- disable v2
- create topic+bucket+notification for each of the above users and for the default user (with default tenant)
- enable v2
- wait for migration to complete
- verify that we can get the 6 topics and notifications
currently, the above new tests as well as the ones already created, will run locally.
in order to run them in teuthology, we would first need https://github.com/ceph/ceph/pull/56005
once it is merged, we should revert the change from here: https://github.com/ceph/ceph/pull/55214#pullrequestreview-1914611297
and also populate the bntests.conf.SAMPLE file used in teuthology with the correct values.
see: https://github.com/ceph/ceph/blob/main/qa/tasks/notification_tests.py
thanks @yuvalif
- poll
topic listcomamnd that should fail untill migration is complete
the "topic list" failure mode is per-tenant. it may succeed even though other tenants haven't finished topic migration. it may also succeed if the tenant's 'topics' object migrated, but we haven't finished migrating all of the tenant's bucket notifications
thanks @yuvalif
- poll
topic listcomamnd that should fail untill migration is completethe "topic list" failure mode is per-tenant. it may succeed even though other tenants haven't finished topic migration. it may also succeed if the tenant's 'topics' object migrated, but we haven't finished migrating all of the tenant's bucket notifications
We can call topic list for each of the tenants. I think that the call should fail as ling as we are migrating