ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Restrict custom CA certificate configurations for the Azure SDK to Linux only

Open jjerphan opened this issue 7 months ago • 6 comments

Reference Issues/PRs

Extracted from #2252.

See discussions in https://github.com/conda-forge/azure-core-cpp-feedstock/pull/23.

What does this implement or fix?

Have the Azure SDK on Windows only use WinHTTP instead of libcurl.

Any other comments?

Checklist

Checklist for code changes...
  • [x] Have you updated the relevant docstrings, documentation and copyright notice?
  • [ ] Is this contribution tested against all ArcticDB's features?
  • [ ] Do all exceptions introduced raise appropriate error messages?
  • [ ] Are API changes highlighted in the PR description?
  • [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

jjerphan avatar Apr 02 '25 14:04 jjerphan

What's the current behaviour on Windows? Is it using winhttp or libcurl at the moment?

Currently libcurl is used, but provided CA certificates path or directories are ignored.

Do we need similar changes for the AWS SDK?

Yes, this was a question I asked privately and I think you would benefit having an identical handling of CA certificates for each SDK (but this is out of the scope of this PR).

If this is making a change to the default transport being used, we also need some performance testing to show any changes before and after this change.

I also agree.

Have you done some end to end testing against a real Azure storage?

I have not.

Note that this change is motivated by the conda-forge builds for Windows and by those discussions https://github.com/conda-forge/azure-core-cpp-feedstock/pull/23. If distributing Windows builds on conda-forge is a priority, we can in a first time make this change for the conda-forge builds and keep the behavior unchanged for the PyPI wheels.

jjerphan avatar Apr 11 '25 09:04 jjerphan

What's the current behaviour on Windows? Is it using winhttp or libcurl at the moment?

Currently libcurl is used, but provided CA certificates path or directories are ignored.

Do we need similar changes for the AWS SDK?

Yes, this was a question I asked privately and I think you would benefit having an identical handling of CA certificates for each SDK (but this is out of the scope of this PR).

If this is making a change to the default transport being used, we also need some performance testing to show any changes before and after this change.

I also agree.

Have you done some end to end testing against a real Azure storage?

I have not.

Note that this change is motivated by the conda-forge builds for Windows and by those discussions conda-forge/azure-core-cpp-feedstock#23. If distributing Windows builds on conda-forge is a priority, we can in a first time make this change for the conda-forge builds and keep the behavior unchanged for the PyPI wheels.

Thanks, I think we need the end to end testing and some performance testing before we can merge this.

I agree that the AWS change is out of scope of this PR, but it should be done as a follow up I think.

I don't think the Windows Conda build is urgent enough to justify divergent behaviour across PyPi and Conda.

poodlewars avatar Apr 11 '25 10:04 poodlewars

I think most of the time will be spent in networking rather than in ArcticDB delegating to WinHTTP or libcurl, but I might be wrong. I would like to verify that also.

How would you design an end to end and performance testing?

jjerphan avatar Apr 14 '25 08:04 jjerphan

I think most of the time will be spent in networking rather than in ArcticDB delegating to WinHTTP or libcurl, but I might be wrong. I would like to verify that also.

How would you design an end to end and performance testing?

You could have a look at our suite of ASV tests for some inspiration, a lot of those run against real S3 storages. I think that if you cover reading, writing and deleting a dataframe with 10M rows and 100 columns (including some strings), and also listing (for example list_versions on 10k symbols) that would be good evidence.

poodlewars avatar Apr 14 '25 16:04 poodlewars

I have seen https://github.com/airspeed-velocity/asv/issues/1465, which needs fixing, yes.

jjerphan avatar Apr 15 '25 11:04 jjerphan

https://github.com/man-group/ArcticDB/pull/2327 fixed the failure with the benchmark workflow.

I think we need to test that this does not break anything unknowingly.

I still think that most of the time spent storing and loading the data will depend on the (impredictable) network latency, be it with libcurl or WinHTTP (for orders of magnitudes regarding latency see this reminder).

jjerphan avatar Apr 16 '25 07:04 jjerphan