nats.net icon indicating copy to clipboard operation
nats.net copied to clipboard

Build and test on multiple platforms

Open jasper-d opened this issue 4 years ago • 7 comments

  • Build on Linux, Windows, and MacOS Binaries built on windows are used in all following stages (with the caveat that coverlet tampers with the instrumented DLLs as far as i know, so we dont test exactly the bits that are pushed to nuget).
  • nats-server is installed using go get, using master for now
  • Tests are run on Linux, Windows, and MacOS (integration test failures are ignored on the latter one)
  • Coverage gets collected through coverlet and reported to codecov.io. This will require installing the Codecov Github app. I'm aware that most nats-io projects use Coveralls, but it has barely any documentation and requires a lot of processing on the client side (converting coverage reports, merging multiple reports). Codecov does not.
  • Packaging updated to a) support Sourcelink b) provide PDB symbols through a seperate .snupkg package. Users can now step into NATS.Client source, set breakpoints there etc. when referencing the nuget package (when symbol server is configured). I pushed a package to https://int.nugettest.org/packages/test-packageid-2A850C7F-0F48-4A42-8F27-206CBC2DB0FF, symbol server is https://symbolsint.nugettest.org/download/symbols.
  • Ralaxed/removed some assertions in integration tests to get to a more reasonable pipeline pass rate, now at ~75%
  • IntegrationTests.TestBasic.TestRequestAsyncCancellation is still failing quite often with a TCE after ~2 minutes on Linux/Mac OS and less frequently after 4 minutes on Windows. I have no idea whats going on there. image

jasper-d avatar Jun 08 '20 21:06 jasper-d

@ColinSullivan1 I'm honestly not sure anymore, if this whole endeavour is feasible. Up until yesterday test runs were mostly failing on Mac OS which I was about to ignore. But after merging master, integration tests started to fail on Windows very frequently too (TestAsyncInfoProtocolPrune is the culprit).

I'll give it some thought over the weekend, but as long as the integration tests are as flaky as they are on Azure Devops, I don't think that extending the test matrix provides any benefit.

jasper-d avatar Aug 14 '20 18:08 jasper-d

Sorry to see you closed this - there were a lot of good changes here... (and I did recently fix TestAsyncInfoProtocolPrune).

ColinSullivan1 avatar Oct 06 '20 18:10 ColinSullivan1

Yeah, I had no hope to get to some reasonable pipeline pass rate. With your fix for TestAsyncInfoProtocolPrune this may be feasible. I'll see if I find a way to selectively ignore failing integration tests on Mac OS and have it run for a few days to gather some data. If it looks okayish, I'll add code coverage, though that may impact tests that rely on timing.

jasper-d avatar Oct 07 '20 22:10 jasper-d

Not sure if it'll help, but IIRC for mac/linux local testing I had to create a softlink... nats-server.exe -> nats-server - a hack...

e.g. $ ls -arlt `which nats-server.exe` lrwxr-xr-x 1 redacted admin 47 May 14 21:27 /usr/local/bin/nats-server.exe -> /Users/redacted/go/bin/nats-server

ColinSullivan1 avatar Oct 07 '20 22:10 ColinSullivan1

Sorry, I should have been more clear. Integration tests generally work on Mac but fail way more often than on Linux/Win. OSX runs caused the majority of pipeline failures before TestAsyncInfoProtocolPrune became an issue.

I haven't found a way to either export test results or deeplink to Azure analytics, but the most notorious tests on Mac have the following pass rates (out of 263 total runs).

Test Mac OS Linux Windows (lower value of coreclr and net fx)
TestReconnectAllowedFlags .76 .98 1
TestDrainBlocking .86 1 .99

I reckon that this might be a infrastructure issue (afaik Mac OS agents aren't hosted by MSFT). Hence resources may be even more limited on those boxes. I dont have a Mac to verify it, but assume they work just as fine as on Linux/Win when run locally.

Anyway, I think is fair to just ignore these failures in favor of a reasonable pass rate.

jasper-d avatar Oct 08 '20 00:10 jasper-d

@ColinSullivan1 I pushed some more changes, reopening to get a up to date delta. Pipeline pass rate is still way to low, but most of these failures are caused by timeouts in Request/Reply tests (Integration test failures on MacOS are ignored now): grafik

I added coverage analysis which works (https://github.com/jasper-d/nats.net/pull/1#issuecomment-708722858, https://codecov.io/gh/jasper-d/nats.net/commits) except for net46 (can be reproduced locally but haven't found a fix, may require moving either tests or NATS.Client to 4.6.1, tried 4.6.2 but could't make it work).

To get this across the finish line I would

  • Remove remaing timing based assertions in https://github.com/nats-io/nats.net/blob/65329276e9379f5ba66189f1c88d7b9894c60e16/src/Tests/IntegrationTests/TestBasic.cs#L877
  • Ignore missing coverage on net46
  • Add Nuget packaging again
  • Clean up everything

Trying to fix the tests or supporting coverage analysis for legacy .NET appears to be unfeasible to me at this point.

jasper-d avatar Oct 15 '20 18:10 jasper-d

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@d5b0fd9). Click here to learn what that means. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #388   +/-   ##
=========================================
  Coverage          ?   80.36%           
=========================================
  Files             ?       86           
  Lines             ?     7770           
  Branches          ?        0           
=========================================
  Hits              ?     6244           
  Misses            ?     1526           
  Partials          ?        0           
Flag Coverage Δ
Darwin-netcoreapp3.1 100.00% <0.00%> (?)
IntegrationTests 79.44% <0.00%> (?)
Linux-netcoreapp3.1 100.00% <0.00%> (?)
UnitTests 4.90% <0.00%> (?)
Windows_NT-net46 0.00% <0.00%> (?)
Windows_NT-netcoreapp3.1 80.65% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d5b0fd9...30d594c. Read the comment docs.

codecov-io avatar Oct 15 '20 21:10 codecov-io