vzlogger icon indicating copy to clipboard operation
vzlogger copied to clipboard

Update MeterW1therm.cpp to catch erroneous temperature values

Open h-eff opened this issue 1 year ago • 3 comments

As announced here: https://www.photovoltaikforum.com/thread/199267-blitzeinschlag-in-der-nachbarschaft-hat-mir-den-vz-lahmgelegt/?postID=3459011#post3459011 , I've written some code to catch

  1. the erroneous 0°C temperature reading,
  2. the 85°C error reading and
  3. the 127.9375°C error reading before they end up in the database. This is based upon work from Chris Petrich: https://github.com/cpetrich/counterfeit_DS18B20

A similar check for conversion success was added to w1_therm kernel module in 2020: https://github.com/torvalds/linux/blame/305230142ae0637213bf6e04f6d9f10bbcb74af8/drivers/w1/slaves/w1_therm.c#L1186-L1197

So when do these readings occur?

  1. can happen when the bus is unstable due to bad design (some sensors in star topology, long wiring, too many sensors, pullup inadequate, EMI etc.). The busmaster will receive only zeroes when it reads the 8 byte scratchpad from a DS18B20. Unfortunately, this will result in a correct CRC checksum (which is also zero). Fortunately, according to the DS18B20 datasheet, it is impossible for a scratchpad to contain only zeroes (i.e. byte 6 won't be zero if byte 0 is zero, the alarm values at byte 2&3 won't be both zero etc.). So this reading can be safely discarded as wrong.
  2. happens when a genuine DS18B20 has to report a temperature before it could finish the temperature conversion. Or, if the first conversion command was never received by the DS18B20 due to an unstable bus. Luckily, scratchpad byte 6 will tell the difference (at least for genuine DS18B20). For counterfeit sensors, I disabled this check, as they mostly do not handle byte 6 as genuine DS18B20 do (see cpetrich). Sensor discrimination is done via sensor address (see cpetrich). This discrimination will work until the serial numbers of genuine DS18B20 reach 0x0001xxxxxxxx. Currently, we are at 0x00000fxxxxxx. I estimate this won't be a problem before 2050.
  3. was never encountered by myself. According to cpetrich, it can happen when a parasitic circuit with some DS18B20 happens to have at least one DS18B20 with Vcc left floating (and not tied to ground as the datasheet demands). Also, temperature readings above 125°C are outside sensor specification and probably always fishy.

In my opinion, all of these readings should not end up as spikes in the temperature database, but preferable as debug error in the vzlogger logfile.

As of yet, I still have little experience writing C++, so probably I did not write the best possible code. Please comment on what could be better.

I did not yet test this code on a RasPi. I plan to set up a test system in the next two weeks.

h-eff avatar Nov 08 '23 14:11 h-eff

thanks for your contribution, good stuff! i think the code could be a little cleaner, can assist if you like. also looking forward to test results, might do some myself.

r00t- avatar Nov 09 '23 18:11 r00t-

@h-eff: also, care to make the formatting check happy, or should we assist? best way is to install clang-format and then run check-formatting.sh locally.

r00t- avatar Nov 14 '23 20:11 r00t-

i suggest we merge this, even with @h-eff not responding anymore, as it's really useful.

can anybody else provide some review?

will try to test also, but it's not possible to test those corner-cases in hardware. (we probably should have automated tests for this.)

r00t- avatar Jun 24 '24 15:06 r00t-