perfview icon indicating copy to clipboard operation
perfview copied to clipboard

Eliminate static state in extension features

Open sharwell opened this issue 7 years ago • 22 comments

  • Eliminate static state in TraceLoadedDotNetRuntimeExtensions
  • Eliminate static state in TraceProcessesExtensions

sharwell avatar Jun 04 '17 07:06 sharwell

Codecov Report

Merging #261 into master will increase coverage by 1.94%. The diff coverage is 89.94%.

@@            Coverage Diff             @@
##           master     #261      +/-   ##
==========================================
+ Coverage   17.44%   19.38%   +1.94%     
==========================================
  Files         213      214       +1     
  Lines      123790   124140     +350     
  Branches    11971    11982      +11     
==========================================
+ Hits        21590    24063    +2473     
+ Misses     101189    99117    -2072     
+ Partials     1011      960      -51
Flag Coverage Δ
#2017 19.38% <89.94%> (+1.94%) :arrow_up:
#Debug 19.38% <89.94%> (+1.94%) :arrow_up:
#Release 100% <ø> (ø) :arrow_up:
Impacted Files Coverage Δ
src/TraceEvent/TraceEvent.Tests/EtlTestBase.cs 90.24% <100%> (+0.77%) :arrow_up:
src/TraceEvent/Computers/TraceManagedProcess.cs 61.51% <30.76%> (+61.51%) :arrow_up:
src/TraceEvent/Computers/TraceProcess.cs 60.12% <75%> (+60.12%) :arrow_up:
...Event.Tests/TraceEventDispatcherExtensionsTests.cs 94.7% <94.7%> (ø)
src/PerfView/StackViewer/PerfDataGrid.xaml.cs 34.02% <0%> (ø) :arrow_up:
src/TraceEvent/DynamicTraceEventParser.cs 67.49% <0%> (+0.15%) :arrow_up:
src/TraceEvent/TraceLog.cs 61.25% <0%> (+0.2%) :arrow_up:
src/TraceEvent/TraceEvent.cs 63.45% <0%> (+0.21%) :arrow_up:
src/TraceEvent/Parsers/TplTraceEventParser.cs 49.65% <0%> (+0.22%) :arrow_up:
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ed9cc8d...20f793a. Read the comment docs.

codecov-io avatar Jun 04 '17 10:06 codecov-io

  • @jeffschwMSFT

vancem avatar Jun 05 '17 01:06 vancem

@sharwell I added a few comments - they hit similar scenarios, reuse and multiple calls on the same source. What sort of validation suite are you using for these? Many of the harder issues with TraceEvent come up when you drill into source reuse, concurrency, and live events. All these scenarios are not just possible they are hit in our live etl processing.

jeffschwMSFT avatar Jun 05 '17 16:06 jeffschwMSFT

@jeffschwMSFT I don't currently have tests for this. If you can describe a test case in a fair amount of specificity, I can get it coded as a unit test.

These changes are focused on eliminating potential negative interactions under concurrent use of independent objects. I was not looking at source reuse. In the latter case, this pull request only has one substantial impact, which is the elimination of clearing the accumulated data under a second call to Need* for the same source. If this is a scenario you need, I can easily add it back without losing the benefits this pull request offers for concurrent usage scenarios.

sharwell avatar Jun 05 '17 17:06 sharwell

@sharwell I am having a hard time understanding the motivation and underlying value of these changes. Can you help me understand? The net-net of the change is removal of one static to track source - which frankly was a loose guarantee / hack. The other changes are just moving things around.

jeffschwMSFT avatar Jun 05 '17 19:06 jeffschwMSFT

I am having a hard time understanding the motivation and underlying value of these changes. Can you help me understand?

My general motivation comes on two fronts:

  1. Ensure that two independent well-behaved¹ users of the TraceEvent library (i.e. users operating on distinct objects and distinct input files) do not interfere with each other in a concurrent application.
  2. Where reasonable, protect against mutations to static state which is not supposed to change, such that it becomes difficult for users to accidentally fail to be well-behaved.

When applied, these practices improve the reliability of the library in distribution and also help our own development through details like preventing one unit test from affecting the behavior of a following unit test.

¹ Well-behaved means adhering to the requirements documented for various methods in the library with respect to exposed static state that is technically mutable but is expected to not actually be mutated by library users.

sharwell avatar Jun 05 '17 20:06 sharwell

@sharwell What case is failing today with the current implementation? In my mind the well-behaved case is load an ETL, process it, then shutdown the process. We do this regularly and much more. A great example of the well behaved case is Perfview and I am not aware of a failure/issue there.

jeffschwMSFT avatar Jun 05 '17 20:06 jeffschwMSFT

@jeffschwMSFT The failing case today is two independent threads concurrently processing two independent trace files. The last one to call NeedProcesses will result in the first, on an unrelated processing thread, having assertion failures (affects debug builds). It will also result in duplicate callback registration if the first calls NeedProcesses again (affects all builds).

sharwell avatar Jun 05 '17 21:06 sharwell

We do that all the time - this is a common case. What is the result when this happens? I am seeing it succeed. Related, what portion of this change materially makes it better?

jeffschwMSFT avatar Jun 05 '17 21:06 jeffschwMSFT

We do that all the time - this is a common case. What is the result when this happens? I am seeing it succeed. Related, what portion of this change materially makes it better?

The most obvious problematic cases only occur in debug builds of the TraceEvent library (i.e. you actually execute your code against a debug build of TraceEvent, as opposed to the more common action of executing a debug build of a tool which references the TraceEvent library). For release builds, the only observable impact in TraceProcesses would be extra-accumulation of the amount of time spent in a process here after using NeedLoadedDotNetRuntimes or NeedProcesses a second time to reset accumulated state.

For duplicate callback registration in NeedLoadedDotNetRuntimes, the resulting aggregation errors are a bit more difficult to be certain of without more work, but I couldn't find anything to indicate that you would experience anything more severe than (presumably subtle) inaccuracy of results if this race condition is hit.

Related, what portion of this change materially makes it better?

We no longer use mutable static state to determine if callbacks are registered. We instead use per-object state to determine this condition with certainty (within the limiting bound of non-concurrent use of instances of this non-thread-safe type).

sharwell avatar Jun 05 '17 22:06 sharwell

Debug TraceEvent has a number of issues - it is noble and much needed to clean those up. Though the asserts I have seen are far deeper within the setup callbacks and not at this level.
I agree the following scenario is problematic (a good bug find): using(source) { source.NeedProcesses(); source.Process(); source.NeedProcesses(); source.Process(); }. Though the fix for that is different than the proposed. I would recommend tackling that issue within the context of also keeping the right semantic for the following, which is far more common: using(source) { source.NeedProcesses(); source.NeedProcesses(); source.Process(); }
We need to find a way to fix the issue you encountered and keep the existing compatibility.

jeffschwMSFT avatar Jun 05 '17 23:06 jeffschwMSFT

@jeffschwMSFT It is my belief (and intent) that the current implementation (after my last modification in commit 7e1e2ca) does not change the behavior for processing a single source under any situation aside from the race condition involving two different sources. From the perspective of compatibility, it is intended to be a fully-compatible change. Is there a case where it doesn't appear to do so?

sharwell avatar Jun 06 '17 01:06 sharwell

I wrote some initial tests and should have a new commit ready in the morning with them. 👍

sharwell avatar Jun 06 '17 02:06 sharwell

To start please merge the relevant commits, after that I will then take a deeper look. The semantics for NeedProcesses and NeedLoadedDotNetRuntimes should be as follows:

  • On first call - setup callbacks and create state
  • On subsequent calls - do nothing
  • After processing and calling - clear state, but do not re-establish callbacks

As for scenarios to test I would do the following:

  • PerfView case

    using(sourceA) {
      sourceA.NeedsProcesses();
      sourceA.NeedsLoadedDotNetRuntimes();
      sourceA.Process();
    }
    
  • Common case

    for(many different sources) {
      using(source) {
        source.NeedsProcesses();
        source.NeedsLoadedDotNetRuntimes();
        source.Process();
      }
    }
    
  • Common case

    using(sourceA) {
      sourceA.NeedsProcesses();
      sourceA.NeedsLoadedDotNetRuntimes();
      sourceA.Process();
    }
    
    /* the same for sourceB */
    
    /* then sourceA again */
    
  • etc. the cases in the tests seem to cover interesting scenarios - the ones I am most concerned with are thread concurrent and multiple different ETLs with occassional duplication

Some of the test cases are really non-scenarios. What is the customer use case for doing sourceA.Process() multiple times in a row? This feels like a debug scenario.

jeffschwMSFT avatar Jun 06 '17 15:06 jeffschwMSFT

Some of the test cases are really non-scenarios. What is the customer use case for doing sourceA.Process() multiple times in a row?

The test doesn't actually call sourceA.Process() two times in a row. The test is intended to deterministically reproduce a concurrency bug by forcing what would normally be two independent operations on two different sources to execute in a specific order. It is a specific linearization of two independent operations, with assertions to verify that the independent operations did not influence each other.

sharwell avatar Jun 06 '17 15:06 sharwell

To start please merge the relevant commits, after that I will then take a deeper look.

@vancem Can you take a look at merging #274 and then #255 to help out with this?

@jeffschwMSFT Prior to that, I would suggest you focus exclusively on reviewing the tests in TraceEventDispatcherExtensionsTests.cs. They are unique to this pull request and are intended to be valid test cases regardless what, if any, other code changes are made.

The semantics for NeedProcesses and NeedLoadedDotNetRuntimes should be as follows:

  • On first call - setup callbacks and create state
  • On subsequent calls - do nothing
  • After processing and calling - clear state, but do not re-establish callbacks

Currently we are close:

  • On first call: setup callbacks and create state ✔️
  • On subsequent calls: do not setup callbacks ✔️ but reestablishes state, which should be idempotent ❓
  • After processing and calling: do not setup callbacks ✔️ and reestablishes state ✔️

Note that the following test failed when I attempted it, which suggests something is not getting cleared properly even prior to my work:

source.NeedProcesses();
source.Process();

source.NeedProcesses();
source.Process();

processTimes = source.Processes().ToDictionary(p => p.ProcessID, p => p.CPUMSec);

However, the equivalent test for NeedDotNetLoadedRuntimes was working correctly, so the state reset failure appears to be limited to TraceProcesses. I filed #276 for this issue.

As for scenarios to test I would do the following:

Most of these were covered already, and I added additional coverage in 86b70e5df5b9706028c57f20ecbdff45b3967745.

sharwell avatar Jun 06 '17 15:06 sharwell

@sharwell merge as in merge the commits, not merge with master. Please collapse the changes so I can see the true deltas from the current source. What realworld scenarios are represented in the test cases?

jeffschwMSFT avatar Jun 06 '17 16:06 jeffschwMSFT

@jeffschwMSFT This has now been updated.

sharwell avatar Jun 14 '17 02:06 sharwell

@sharwell as discussed please merge relevant commits and then I will take a deeper look.

jeffschwMSFT avatar Jun 14 '17 17:06 jeffschwMSFT

@jeffschwMSFT I rebased and merged to now have three distinct units of work:

  1. Add tests which demonstrate the observable problems and known use cases
  2. Self-contained corrections to TraceProcessesExtensions
  3. Self-contained corrections to TraceLoadedDotNetRuntimeExtensions

sharwell avatar Jun 14 '17 17:06 sharwell

This change looks good. Thanks for adding tests.

jeffschwMSFT avatar Jun 14 '17 17:06 jeffschwMSFT

Updated after repository rewrite

sharwell avatar Feb 02 '18 11:02 sharwell