win icon indicating copy to clipboard operation
win copied to clipboard

PdhValidatePath unexpected errors

Open jwenz723 opened this issue 6 years ago • 6 comments

I am working on building a pdh exporter for prometheus. Currently when my application initiates it calls the win.PdhValidatePath() function approximately 3000 times. Most of the calls succeed, however, some fail inconsistently.

I've discovered that the errors that are occurring are the following:

  • "The system could not find the environment option that was entered."
  • "The data area passed to a system call is too small.")
  • "There are no more files."
  • "Access is denied."

For all pdh counters that result in the errors above, if I try to collect them individually then no errors occur. The errors only occur when large amounts of pdh counters are using the PdhValidatePath() function. I do not have the PdhValidatePath() function inside of a Go routine, so concurrency should not be the problem here.

Any ideas on how to fix this? One solution I found online that wasn't officially answered suggested using the PdhValidatePathEx function rather than PdhValidatePath.

Thought I'd ask here before going down that path in case I am doing something wrong. If you want to see my code, it can be seen here.

jwenz723 avatar Apr 18 '18 07:04 jwenz723

I'm the original committer of pdh.go but has been quite some time since I last used it myself... Anyway, one thing I can say that everything is obviously just a wrapper around the DLL. Pretty good chance that it's just how the whole thing 'works' :)

Are you able to write up the smallest test case which is able to reproduce it? I could check it tonight, (also when there's no corporate proxy interfering).

krpors avatar Apr 18 '18 08:04 krpors

Well I wrote up a small test case and put it in this repo.

What I am finding now, is that the PdhCollectQueryData function is actually the problem rather than the PdhValidatePath. Like I said before, the issue is very inconsistent.

To run the test program, fill the counters.txt file with a large amount of pdh counter paths (1 per line). I just ran in a command prompt: typeperf -q then copied the output into counters.txt.

It seems to me that the problem might start occurring when too many PDH queries are opened at the same time? But i'm not sure what 'too many' is. Or how to catch that. Any insight would be very helpful.

jwenz723 avatar Apr 18 '18 17:04 jwenz723

OK I tried something. I used typeperf -q > counters.txt and ran your test program and observed that the PdhCollectQueryData indeed looks to be the culprit here. I got a total of 1877 counters. 456 went into error due to the PdhCollectQueryData and 13 due to PdhAddEnglishCounter.

I took one of the groups which failed (\Distributed Routing Table(*)) and tried adding them all using the Windows Performance Monitor instead (perfmon.exe). They did not show up in the graph, and no error was displayed. Couldn't see anything in the event viewer as well. I'm unsure what is going on here.

I found this link but that did not seem to work on my machine (Windows 7 Enterprise).

Edit: I also tried putting a sleep of 10, 50 and 100 ms in the loop, but that did not have any effect. I got deterministic output, so it seems.

krpors avatar Apr 18 '18 19:04 krpors

That article you found is interesting. I tried running the lodctr /r command, but it also seemed to have no effect. It doesn't seem to me like the errors are occurring because of corruption. Rather it seems like the number of counters being gathered simultaneously has exceeded what pdh is capable of handling.

Wish there was more documentation on the win pdh stuff.

jwenz723 avatar Apr 19 '18 05:04 jwenz723

I think I may have found a solution to this problem. My previous method was using 1 PDH Query Handle per PDH Counter. I changed my code so that now only 1 PDH Query Handle is used to contain ALL PDH Counters. This seems to cause all of the errors that were occurring to stop. It appears that PDH has an undocumented limit on how many PDH Queries can be opened simultaneously.

I updated my small test project here to reflect these changes.

So I believe that this repository can remain unchanged and that the errors I was experiencing were just due to my limited understanding of how to properly use PDH.

jwenz723 avatar Apr 25 '18 21:04 jwenz723

Awesome!

Perhaps it's a good idea to document this behavior or whatever into the documentation of pdh.go. I agree that PDH is rather cryptic. I therefore tried to document its usage as properly as possible back in the day. I suppose your finding is a rather useful addition.

krpors avatar Apr 30 '18 14:04 krpors