orleans icon indicating copy to clipboard operation
orleans copied to clipboard

Initial/Migration CosmosDB Providers

Open galvesribeiro opened this issue 2 years ago • 6 comments

This PR brings OrleansContrib CosmosDB providers to the main repo.

It also implement the features mentioned in #7726 and some other pending PRs on the original repo among with several QoL improvements.

Closes: #7726

Microsoft Reviewers: Open in CodeFlow

galvesribeiro avatar Jul 09 '22 03:07 galvesribeiro

cc @mumby0168

ReubenBond avatar Jul 11 '22 16:07 ReubenBond

cc @mumby0168

Hi thanks for tagging me in this, I've been away for the last few weeks, I get back this weekend. I'd love to take a look at this, when I am back :)

mumby0168 avatar Jul 12 '22 05:07 mumby0168

Thanks, @mumby0168! Apologies for the slow response, @galvesribeiro - I've been distracted. I hope to review it soon. cc @bradygaster

ReubenBond avatar Aug 03 '22 14:08 ReubenBond

No problem. For some reason the service container for the emulator doesn't seems to be running on some of the test jobs. I've ran it against a real Cosmos Account and it just work.

galvesribeiro avatar Aug 03 '22 14:08 galvesribeiro

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

ghost avatar Aug 25 '22 23:08 ghost

cc @galvesribeiro I added some comments

ReubenBond avatar Aug 29 '22 17:08 ReubenBond

Feedback addressed @ReubenBond. Sorry for taking so long.

galvesribeiro avatar Oct 11 '22 12:10 galvesribeiro

Persistence and Reminder tests are not passing on my machine, targeting a real cosmosdb endpoint. Is there any specific config to do @galvesribeiro ?

benjaminpetit avatar Oct 18 '22 12:10 benjaminpetit

There shouldn't be any differences or special config. They all pass on my machine either on the emulator or targeting a real account @benjaminpetit. I'll pull your changes in and try it again when I get home Tonight.

galvesribeiro avatar Oct 18 '22 12:10 galvesribeiro

@benjaminpetit anything remaining here? If we decide to merge this for 7.0, perhaps we should add a beta tag to it until we have sufficient confidence

ReubenBond avatar Oct 26 '22 19:10 ReubenBond

It seems like there is something with the setup on GH for the tests, will look at it. It run fine on the local emulator (tried on a windows VM over the weekend) or real account in Azure. Sorry I forgot to report back.

galvesribeiro avatar Oct 26 '22 20:10 galvesribeiro

I believe the issue is currently that the certificate bundle needs to be split into multiple files (one per cert).

rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL rehash: warning: skipping emulatorcert.pem,it does not contain exactly one certificate or CRL

ReubenBond avatar Oct 26 '22 21:10 ReubenBond

The failure now is more basic... it can't even connect to the emulator to retrieve the cert.

ReubenBond avatar Oct 26 '22 23:10 ReubenBond

Note that I disabled all other GH Actions jobs in this PR so that we can try to fix the CosmosDB one. We should revert that before merging.

ReubenBond avatar Oct 27 '22 02:10 ReubenBond

The emulator sometimes takes a very long time to start. I don't think we need to split the cert, my guess it that sometimes the emulator isn't really started, and it downloads some garbage.

We had run in the past were the SSL part was fine.

benjaminpetit avatar Oct 27 '22 08:10 benjaminpetit

Doesn't that error message indicate that the cert needs to be split? It says it's skipping the import

rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL rehash: warning: skipping emulatorcert.pem,it does not contain exactly one certificate or CRL

ReubenBond avatar Oct 27 '22 14:10 ReubenBond

On previous run we didn't had this warning, and we didn't had SSL errors during the test. Unless I am mistaken...

benjaminpetit avatar Oct 27 '22 14:10 benjaminpetit

Maybe we can make the job loop until the emulator is available, since even 90s doesn't seem to cut it

ReubenBond avatar Oct 27 '22 14:10 ReubenBond

Rebased, refactored, and tried to fix tests. I'm testing against a real-world instance of Cosmos DB and the tests are very flaky - too flaky to commit right now. I'm wondering what we can do to improve that.

ReubenBond avatar Dec 21 '22 20:12 ReubenBond

I updated the options to reflect the style we've been using for other providers: one config method for each constructor/factory overload on the underlying storage client (CosmosClient in this case).

I am also wrapping the unserializable CosmosException in a new WrappedException type.

ReubenBond avatar Dec 21 '22 20:12 ReubenBond

I also removed the System.Text.Json serializer as a default: if users want it, they can use it, but I opted to use the Cosmos SDK's default instead.

This storage provider doesn't currently support pluggable serialization. I wonder if it should... I think that the IGrainStorageSerializer interface would need to be updated to have some way to know that can output a JSON string instead of always binary. cc @benjaminpetit for fyi/thoughts.

ReubenBond avatar Dec 21 '22 23:12 ReubenBond

TODO:

  • [ ] Add Cosmos DB account to internal test infrastructure and make sure that tests work there
  • [x] ~Decide what to do with IGrainStorageSerializer for JSON. We could take what it gives and Base64 encode it, but that largely defeats the purpose of using a document DB and we should figure this out for other JSON-compatible stores or stores where text is preferred.~ When a database offers its own serialization system, I believe we should delegate to that and prevent having multiple.

ReubenBond avatar Dec 22 '22 15:12 ReubenBond

@galvesribeiro aside from the GitHub Actions emulator issues, is there anything remaining on this PR which comes to mind?

ReubenBond avatar Mar 13 '23 22:03 ReubenBond

Nope. We sorted out everything and it was recently rebased on main.

galvesribeiro avatar Mar 13 '23 22:03 galvesribeiro

What's left on this PR?

scalablecory avatar May 04 '23 07:05 scalablecory

@scalablecory we are trying to get the tests passing using the Cosmos DB emulator. It has been very flaky for us, however. Eg:
image We want to avoid merging things which we cannot test in CI. We've added CI tests for everything which we can, ideally using containers or emulators (Azure Event Hubs being an exception: it runs against a real instance, in private CI)

ReubenBond avatar Jun 13 '23 14:06 ReubenBond

@scalablecory we are trying to get the tests passing using the Cosmos DB emulator. It has been very flaky for us, however. Eg: image We want to avoid merging things which we cannot test in CI. We've added CI tests for everything which we can, ideally using containers or emulators (Azure Event Hubs being an exception: it runs against a real instance, in private CI)

For the record, we've already tried Linux emulator container and now running on a Windows agent using the PowerShell startup. It seems really flaky...

galvesribeiro avatar Jun 13 '23 14:06 galvesribeiro

We will disable GitHub Actions CI tests and instead we will run test via the private CI against a live instance of Cosmos DB. I've asked the Cosmos DB SDK team what their preferred package names would be. I will also give the PR a final once-over before merging.

ReubenBond avatar Jun 13 '23 21:06 ReubenBond

I gave the PR a few once-overs and renamed everything in accordance with Cosmos' team's preferences. I believe this is ready to merge now.

ReubenBond avatar Jun 15 '23 02:06 ReubenBond