Fanny icon indicating copy to clipboard operation
Fanny copied to clipboard

Wrong Temperatures Shown on Mac Pro 6,1: SP78 Numbers Interpreted Incorrectly

Open LangdonStIves opened this issue 5 years ago • 0 comments

Note: Much or all of the details below may be completely specific to the Mac Pro 6,1, or to Mac Pros in general, which are using Xeons -- the core i5/7/9 may be doing things similarly or differently, I have no idea.

On my Mac Pro 6, 1 2013 with E5-2667v2, Fanny shows ridiculously large "Die" temps (sensor: TC0F). After much confusion and doubt, I have finally discovered that this is because these numbers are in Apple's weird SP78 data type (1 bit sign, 7 bits integer portion, 8 bits fractional part), combined with the fact that this -- at least for these E5 processors -- is specified as offset from what Intel calls T_control, which is the temperature at which TCC will kick in, i.e., the CPU will start thermal throttling (see Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 v2 Product Families Datasheet Volume One, sec. 5.1.1 Thermal Specifications).

When I load all cores to heat up the CPU, Fanny will report numbers (for "Die" temp) well above 100°C, in fact at Apple's conservative factory fan settings it will go up to 120°C. I couldn't believe this from the beginning, because provably no significant thermal throttling was happening. Until today the only other tool I found that even shows the TC0F sensor, SMCKit, also showed these insane numbers, but the first hint something was off here was that it shows these as negative numbers (but with the same absolute value as Fanny).

Finally today I found iStats, a ruby gem that can also show this, but apparently finally the first one that correctly handles the SP78 values used for some of these temperature values. And sure enough, comparing the numbers (starting at high load like right now, when I have Folding@Home running on all cores):

Fanny: 108°C iStats: TC0F CPU 0 ?? temp: -19.97°C

If I stop F@H and let the CPU cool down:

Fanny: 98°C iStats: TC0F CPU 0 ?? temp: -29.94°C

Fanny: 95°C iStats: TC0F CPU 0 ?? temp: -33.34°C

Fanny: 92°C iStats: TC0F CPU 0 ?? temp: -36.16°C

and so forth. So we see that it's always precisely 128 + actual value (which is the delta from T_control). Looks very much like a signed/unsigned issue, the sign being interpreted as 128 or something like that. I've looked through the code a bit but couldn't work out what exactly goes wrong.

Now to get the real absolute temperature being reported here, one would need the value of T_control (which can be different from SKU to SKU. EDIT: or maybe even from CPU to CPU, not sure). Apparently there is a CPU register that contains this (TEMPERATURE_TARGET, see Datasheet Vol. 1 Sec. 5.2.1), so reading it and adding it would be the correct way to determine the die temperature.

On the other hand, it would actually be insanely useful to also have the offset itself available in a monitoring app -- this would in fact be better than any of the absolute temperatures, because when this approaches 0 or even becomes positive, that's where it gets dangerous for the CPU, and where it starts protecting itself via TCC...

LangdonStIves avatar Jun 12 '20 02:06 LangdonStIves