mt76 icon indicating copy to clipboard operation
mt76 copied to clipboard

Wrong temp reading on MT7915_phy0

Open Sandokan71 opened this issue 1 year ago • 25 comments

I did some tests for temp reading. I get the following readings from internal sensors in standby (no devices connected on WiFi) with a room temp of 20C:

phy0 2.4Ghz -> 68C phy1 5Ghz -> 43C On the SoC the reading by internal sensor is 45C.

The temperatures detected with a thermal scanner (my bet was 3-4C low) are: on 2.4Ghz -> 37.1C on 5Ghz -> 40.5C On the Soc I get 45.5C.

It seems to me that the temp reading by the sensor on 2.4Ghz chip is not so correct.

Sandokan71 avatar Jan 12 '23 19:01 Sandokan71

As this issue was reported first in BananaPi forum to occur on BPi-R3, let me add some details: This is MT7986A with MT7975PN and MT7975N front-ends. The wrong temperature readings correspond to the MT7975N chip in charge of 2.4 GHz.

dangowrt avatar Jan 12 '23 19:01 dangowrt

Adding an information about test I made. Graphics of first 2H from a cold start in attach (measurament are made with a unique heatsink on both chip) shows that on 2.4Ghz front-end MT7975N (Phy0) there is an offset of about 27 Celsius above what expected. This assuming that both 2.4 and 5Ghz chips have a similar behaviour.

2023-01-17

Sandokan71 avatar Jan 17 '23 20:01 Sandokan71

can confirm the difference

root@bpi-r3:~# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input
66000
45000

measured the chips with infrared thermometer

2g4: 47°C 5G: 44°C

frank-w avatar Jan 27 '23 16:01 frank-w

yes, and observing the graph it is impossible that few seconds from the device start the 2.4Ghz chip is at 55C and the 5Ghz chip at 27C.

Sandokan71 avatar Jan 27 '23 19:01 Sandokan71

Here is the output of my MT7986 reference board. Looks normal.

root@OpenWrt:/# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input 44000 48000

ryderlee1110 avatar Feb 02 '23 06:02 ryderlee1110

After many tests and measurements I confirm bad temp reading on sensors of the 2.4ghz chip on my board. Maybe a problem on the chip but only on temp reading? The chip real temp seems normal and it works regular.

Sandokan71 avatar Feb 02 '23 06:02 Sandokan71

BPI R3?

ryderlee1110 avatar Feb 02 '23 06:02 ryderlee1110

@ryderlee1110 does your ref-board use MT7975N too for 2g4?

frank-w avatar Feb 02 '23 06:02 frank-w

MT7976 for 2/6g

ryderlee1110 avatar Feb 02 '23 06:02 ryderlee1110

So we maybe need different offset or calculation for this chip

https://github.com/openwrt/mt76/blob/master/mt7915/init.c#L55

https://github.com/openwrt/mt76/blob/master/mt7915/mcu.c#L3108

When looking at the graph above,offset/command is right,but value itself seems not millicelsius or need some other calibration data?

frank-w avatar Feb 02 '23 07:02 frank-w

Here a graph comparing MT7975N and MT7975N on three days with and without fan cooling to explore more temps range. The calculations seems to me correct. Probably it is only an offset issue.

2023-02-03

Sandokan71 avatar Feb 03 '23 18:02 Sandokan71

i guess more the eeprom (which maybe sets the temp value offset) is wrong...

i see function mt7915_eeprom_name in mt7915/eeprom.c which selects the eeprom, but this function seems not to be called on my r3 as i do not see my printks i added there...

i try to further debug, but this function seems to be called only if there is no eeprom...stop wait...we have added eeprom in dts...both in my repo and openwrt...maybe this is the wrong for out frontend-chips

frank-w avatar Feb 07 '23 18:02 frank-w

same output with disabled eeprom-data in dts

root@bpi-r3:~# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input 43000 23000

my debug shows now that MT7975_DUAL_ADIE (MT7986_EEPROM_MT7975_DUAL_DEFAULT) option is used after first eeprom-load (mt7915_eeprom_load) fails now in mt7915_eeprom_init with ret=-22, second one (mt7915_eeprom_load_default) returns 0

https://elixir.bootlin.com/linux/v6.2-rc6/source/drivers/net/wireless/mediatek/mt76/mt7915/eeprom.c#L60

frank-w avatar Feb 07 '23 21:02 frank-w

Maybe this is a bug in the EEPROM data supplied by SinoVoip and we should actually just fix that...

dangowrt avatar Feb 07 '23 22:02 dangowrt

I loaded the eeprom which is available in linux-firmware git

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek

But yes,it can be wrong

frank-w avatar Feb 07 '23 22:02 frank-w

@ryderlee1110 any idea how to get further here?

frank-w avatar Feb 27 '23 12:02 frank-w

The issue still there, any idea on how to solve?

Sandokan71 avatar Mar 25 '23 17:03 Sandokan71

I'm confirming the issue on a Banana r3 with OpenWRT r22537-32f134fbdf. I used a thermometer gun and I get a reading of maximum 40C and the sensor reports 63C.

codingtony avatar Apr 13 '23 19:04 codingtony

I purchased a second BPI-R3, and on this one the detected temperature is correct. Something differs between the two boards.

Sandokan71 avatar Apr 15 '23 09:04 Sandokan71

What is hardware revision and can you look on the frontend chip if this is still a mt7975?

frank-w avatar Apr 15 '23 09:04 frank-w

Both have the same revision v1.1 and the same IC.

Sandokan71 avatar Apr 15 '23 09:04 Sandokan71

board assembly process...

I purchased a second BPI-R3, and on this one the detected temperature is correct. Something differs between the two boards.

It could be that efuse inside the MT7975 ICs doesn't come with valid thermal calibration which should have been done by the board vendor...

dangowrt avatar Apr 15 '23 12:04 dangowrt

I agree with you. It would be useful to know if it is possible to set properly the efuse.
After long time monitoring I can confirm that on my original board the 2.4Ghz have +27C offset. It is not good to see temperatures of 60-75C with 20C ambient temp but since they are actually 33/48C I am not so worried about this. However, I hope this will not result in strange behavior if temperatures rise further when ambient temp will rise to 30C and over. Like thermal protection engage or similar.

Sandokan71 avatar Apr 15 '23 13:04 Sandokan71

Or at least detect the problematic firmware (or invalid calibration data) from driver to handle it there (maybe off-tree for affected boards to hold mainlinedriver clean for this)?

frank-w avatar Apr 15 '23 17:04 frank-w

Sorry to bump this issue again, but I have the opposite of what's posted earlier. Rev 1.1

root@bpi:~# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input
49000 <-- 2g
66000 <-- 5g

skramstad avatar Mar 13 '24 17:03 skramstad