Eliminate static state in TraceLoadedDotNetRuntimeExtensions
Eliminate static state in TraceProcessesExtensions

Jun 04 '17 07:06 sharwell

Codecov Report

Merging #261 into master will increase coverage by 1.94%. The diff coverage is 89.94%.

@@            Coverage Diff             @@
##           master     #261      +/-   ##
==========================================
+ Coverage   17.44%   19.38%   +1.94%     
==========================================
  Files         213      214       +1     
  Lines      123790   124140     +350     
  Branches    11971    11982      +11     
==========================================
+ Hits        21590    24063    +2473     
+ Misses     101189    99117    -2072     
+ Partials     1011      960      -51

Flag	Coverage Δ
#2017	`19.38% <89.94%> (+1.94%)`	:arrow_up:
#Debug	`19.38% <89.94%> (+1.94%)`	:arrow_up:
#Release	`100% <ø> (ø)`	:arrow_up:

Impacted Files	Coverage Δ
src/TraceEvent/TraceEvent.Tests/EtlTestBase.cs	`90.24% <100%> (+0.77%)`	:arrow_up:
src/TraceEvent/Computers/TraceManagedProcess.cs	`61.51% <30.76%> (+61.51%)`	:arrow_up:
src/TraceEvent/Computers/TraceProcess.cs	`60.12% <75%> (+60.12%)`	:arrow_up:
...Event.Tests/TraceEventDispatcherExtensionsTests.cs	`94.7% <94.7%> (ø)`
src/PerfView/StackViewer/PerfDataGrid.xaml.cs	`34.02% <0%> (ø)`	:arrow_up:
src/TraceEvent/DynamicTraceEventParser.cs	`67.49% <0%> (+0.15%)`	:arrow_up:
src/TraceEvent/TraceLog.cs	`61.25% <0%> (+0.2%)`	:arrow_up:
src/TraceEvent/TraceEvent.cs	`63.45% <0%> (+0.21%)`	:arrow_up:
src/TraceEvent/Parsers/TplTraceEventParser.cs	`49.65% <0%> (+0.22%)`	:arrow_up:
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ed9cc8d...20f793a. Read the comment docs.

Jun 04 '17 10:06 codecov-io

@jeffschwMSFT

Jun 05 '17 01:06 vancem

@sharwell I added a few comments - they hit similar scenarios, reuse and multiple calls on the same source. What sort of validation suite are you using for these? Many of the harder issues with TraceEvent come up when you drill into source reuse, concurrency, and live events. All these scenarios are not just possible they are hit in our live etl processing.

Jun 05 '17 16:06 jeffschwMSFT

@jeffschwMSFT I don't currently have tests for this. If you can describe a test case in a fair amount of specificity, I can get it coded as a unit test.

These changes are focused on eliminating potential negative interactions under concurrent use of independent objects. I was not looking at source reuse. In the latter case, this pull request only has one substantial impact, which is the elimination of clearing the accumulated data under a second call to Need* for the same source. If this is a scenario you need, I can easily add it back without losing the benefits this pull request offers for concurrent usage scenarios.

Jun 05 '17 17:06 sharwell

@sharwell I am having a hard time understanding the motivation and underlying value of these changes. Can you help me understand? The net-net of the change is removal of one static to track source - which frankly was a loose guarantee / hack. The other changes are just moving things around.

Jun 05 '17 19:06 jeffschwMSFT

I am having a hard time understanding the motivation and underlying value of these changes. Can you help me understand?

My general motivation comes on two fronts:

Ensure that two independent well-behaved¹ users of the TraceEvent library (i.e. users operating on distinct objects and distinct input files) do not interfere with each other in a concurrent application.
Where reasonable, protect against mutations to static state which is not supposed to change, such that it becomes difficult for users to accidentally fail to be well-behaved.

When applied, these practices improve the reliability of the library in distribution and also help our own development through details like preventing one unit test from affecting the behavior of a following unit test.

¹ Well-behaved means adhering to the requirements documented for various methods in the library with respect to exposed static state that is technically mutable but is expected to not actually be mutated by library users.

Jun 05 '17 20:06 sharwell

@sharwell What case is failing today with the current implementation? In my mind the well-behaved case is load an ETL, process it, then shutdown the process. We do this regularly and much more. A great example of the well behaved case is Perfview and I am not aware of a failure/issue there.

Jun 05 '17 20:06 jeffschwMSFT

@jeffschwMSFT The failing case today is two independent threads concurrently processing two independent trace files. The last one to call NeedProcesses will result in the first, on an unrelated processing thread, having assertion failures (affects debug builds). It will also result in duplicate callback registration if the first calls NeedProcesses again (affects all builds).

Jun 05 '17 21:06 sharwell

We do that all the time - this is a common case. What is the result when this happens? I am seeing it succeed. Related, what portion of this change materially makes it better?

Jun 05 '17 21:06 jeffschwMSFT

We do that all the time - this is a common case. What is the result when this happens? I am seeing it succeed. Related, what portion of this change materially makes it better?

The most obvious problematic cases only occur in debug builds of the TraceEvent library (i.e. you actually execute your code against a debug build of TraceEvent, as opposed to the more common action of executing a debug build of a tool which references the TraceEvent library). For release builds, the only observable impact in TraceProcesses would be extra-accumulation of the amount of time spent in a process here after using NeedLoadedDotNetRuntimes or NeedProcesses a second time to reset accumulated state.

For duplicate callback registration in NeedLoadedDotNetRuntimes, the resulting aggregation errors are a bit more difficult to be certain of without more work, but I couldn't find anything to indicate that you would experience anything more severe than (presumably subtle) inaccuracy of results if this race condition is hit.

Related, what portion of this change materially makes it better?

We no longer use mutable static state to determine if callbacks are registered. We instead use per-object state to determine this condition with certainty (within the limiting bound of non-concurrent use of instances of this non-thread-safe type).

Jun 05 '17 22:06 sharwell

Debug TraceEvent has a number of issues - it is noble and much needed to clean those up. Though the asserts I have seen are far deeper within the setup callbacks and not at this level.
I agree the following scenario is problematic (a good bug find): using(source) { source.NeedProcesses(); source.Process(); source.NeedProcesses(); source.Process(); }. Though the fix for that is different than the proposed. I would recommend tackling that issue within the context of also keeping the right semantic for the following, which is far more common: using(source) { source.NeedProcesses(); source.NeedProcesses(); source.Process(); }
We need to find a way to fix the issue you encountered and keep the existing compatibility.

Jun 05 '17 23:06 jeffschwMSFT

@jeffschwMSFT It is my belief (and intent) that the current implementation (after my last modification in commit 7e1e2ca) does not change the behavior for processing a single source under any situation aside from the race condition involving two different sources. From the perspective of compatibility, it is intended to be a fully-compatible change. Is there a case where it doesn't appear to do so?

Jun 06 '17 01:06 sharwell

I wrote some initial tests and should have a new commit ready in the morning with them. 👍

Jun 06 '17 02:06 sharwell

To start please merge the relevant commits, after that I will then take a deeper look. The semantics for NeedProcesses and NeedLoadedDotNetRuntimes should be as follows:

On first call - setup callbacks and create state
On subsequent calls - do nothing
After processing and calling - clear state, but do not re-establish callbacks

As for scenarios to test I would do the following:

PerfView case

using(sourceA) {
  sourceA.NeedsProcesses();
  sourceA.NeedsLoadedDotNetRuntimes();
  sourceA.Process();
}

Common case

for(many different sources) {
  using(source) {
    source.NeedsProcesses();
    source.NeedsLoadedDotNetRuntimes();
    source.Process();
  }
}

Common case

using(sourceA) {
  sourceA.NeedsProcesses();
  sourceA.NeedsLoadedDotNetRuntimes();
  sourceA.Process();
}

/* the same for sourceB */

/* then sourceA again */

etc. the cases in the tests seem to cover interesting scenarios - the ones I am most concerned with are thread concurrent and multiple different ETLs with occassional duplication

Some of the test cases are really non-scenarios. What is the customer use case for doing sourceA.Process() multiple times in a row? This feels like a debug scenario.

Jun 06 '17 15:06 jeffschwMSFT

Some of the test cases are really non-scenarios. What is the customer use case for doing sourceA.Process() multiple times in a row?

The test doesn't actually call sourceA.Process() two times in a row. The test is intended to deterministically reproduce a concurrency bug by forcing what would normally be two independent operations on two different sources to execute in a specific order. It is a specific linearization of two independent operations, with assertions to verify that the independent operations did not influence each other.

Jun 06 '17 15:06 sharwell

To start please merge the relevant commits, after that I will then take a deeper look.

@vancem Can you take a look at merging #274 and then #255 to help out with this?

@jeffschwMSFT Prior to that, I would suggest you focus exclusively on reviewing the tests in TraceEventDispatcherExtensionsTests.cs. They are unique to this pull request and are intended to be valid test cases regardless what, if any, other code changes are made.

The semantics for NeedProcesses and NeedLoadedDotNetRuntimes should be as follows:

On first call - setup callbacks and create state

On subsequent calls - do nothing

After processing and calling - clear state, but do not re-establish callbacks

Currently we are close:

On first call: setup callbacks and create state ✔️
On subsequent calls: do not setup callbacks ✔️ but reestablishes state, which should be idempotent ❓
After processing and calling: do not setup callbacks ✔️ and reestablishes state ✔️

Note that the following test failed when I attempted it, which suggests something is not getting cleared properly even prior to my work:

source.NeedProcesses();
source.Process();

source.NeedProcesses();
source.Process();

processTimes = source.Processes().ToDictionary(p => p.ProcessID, p => p.CPUMSec);

However, the equivalent test for NeedDotNetLoadedRuntimes was working correctly, so the state reset failure appears to be limited to TraceProcesses. I filed #276 for this issue.

As for scenarios to test I would do the following:

Most of these were covered already, and I added additional coverage in 86b70e5df5b9706028c57f20ecbdff45b3967745.

Jun 06 '17 15:06 sharwell

@sharwell merge as in merge the commits, not merge with master. Please collapse the changes so I can see the true deltas from the current source. What realworld scenarios are represented in the test cases?

Jun 06 '17 16:06 jeffschwMSFT

@jeffschwMSFT This has now been updated.

Jun 14 '17 02:06 sharwell

@sharwell as discussed please merge relevant commits and then I will take a deeper look.

Jun 14 '17 17:06 jeffschwMSFT

@jeffschwMSFT I rebased and merged to now have three distinct units of work:

Add tests which demonstrate the observable problems and known use cases
Self-contained corrections to TraceProcessesExtensions
Self-contained corrections to TraceLoadedDotNetRuntimeExtensions

Jun 14 '17 17:06 sharwell

This change looks good. Thanks for adding tests.

Jun 14 '17 17:06 jeffschwMSFT

Updated after repository rewrite

Feb 02 '18 11:02 sharwell

perfview
perfview copied to clipboard

Eliminate static state in extension features

Codecov Report

perfview perfview copied to clipboard

Eliminate static state in extension features

Codecov Report

perfview
perfview copied to clipboard