w3up icon indicating copy to clipboard operation
w3up copied to clipboard

User data migration

Open Gozala opened this issue 2 years ago • 7 comments

We should provide some ability for users to migrate data from old system into a new one. This implies moving store and upload records. All the stored CARs should already be in the system, so users will not need to download and reupload anything, store/add would just succeed right away.

We currently store records for each upload as CAR regardless of what user sent us see

https://github.com/web3-storage/web3.storage/blob/main/packages/api/src/upload.js#L51-L53 https://github.com/web3-storage/web3.storage/blob/main/packages/api/src/car.js#L160-L173

We do not store CAR cid in the DB, but we encode it in the backupURLs and should be able to derive from it.

Unfortunately our API does not return backupURLs either so it would be impossible to do everything on the client side. We could however amend old API with additional shards field and return CAR cids. If we do that we would be able to then have a client script that fetches all user uploads using JWT token and derives store/add and upload/add invocations from it.

Note that in the old system uploads had names but we don't have those in the new system, not yet anyway. However we could stick names into invocation facts (like we do with space names) in order to retain this information in case we'l need it in the future.

Gozala avatar Nov 30 '23 05:11 Gozala

We should also align all export with https://github.com/web3-storage/w3up/issues/1018

Gozala avatar Nov 30 '23 05:11 Gozala

Since we have to do some work on the backend I wonder if it would be better if instead we implement just /export API endpoint that will return a CAR representing user space as per https://github.com/web3-storage/w3up/issues/1018.

On the w3up side we can implement store/import / upload/import capabilities that user could then pass exported CAR in order to import all the entries from legacy system.

Gozala avatar Nov 30 '23 05:11 Gozala

Yet another option would be to implement /migrate endpoint that takes UCAN delegation with user space store/* and upload/* capabilities. Then it could create export and do the import without having to send user entries which they will need to upload.

Gozala avatar Nov 30 '23 05:11 Gozala

All the stored CARs should already be in the system, so users will not need to download and reupload anything, store/add would just succeed right away.

This is mostly true if we still use carpark-prod-0 at that point in time. However, I think that we also had old content in olderbuckets in S3 as well. Therefore, I think we may need to consider having a migration tool that queues things to eventually move them into different buckets or similar. Moving things along within same region in AWS will be easy and not expensive. But, maybe we want to also put it in CF only.

I think this needs more details on requirements and some close work with the bucket migration. For instance, this may be a good reason to punt on changing the bucket until migration is done

vasco-santos avatar Nov 30 '23 11:11 vasco-santos

We do not store CAR cid in the DB, but we encode it in the backupURLs and should be able to derive from it.

I think we do not have backup URL in all the places. This was even problematic for the migration, but probably once we migrated all CARs to R2, we will have a mapping of rootCid to CARs that we can rely on

vasco-santos avatar Nov 30 '23 12:11 vasco-santos

Note that in the old system uploads had names but we don't have those in the new system, not yet anyway. However we could stick names into invocation facts (like we do with space names) in order to retain this information in case we'l need it in the future.

Would be great to get that info as a dump to users. And note that this kind of metadata should now be stored at the client level

vasco-santos avatar Nov 30 '23 12:11 vasco-santos

Finally, I think we should definitely create a backend migration tool. (May be worth to check R2 migration tool from S3). We could then monitor progress and keep things stable instead of go through spikes

We could just queue things and keep users updated. This queue could even send things to Filecoin as well for renewals (given there will be no direct trigger) - depends on some cross team decisions first

vasco-santos avatar Nov 30 '23 12:11 vasco-santos