tinytuya icon indicating copy to clipboard operation
tinytuya copied to clipboard

Different / Varied Failure Cases

Open arrmo opened this issue 4 years ago • 15 comments
trafficstars

Hi,

Really liking tinytuya - appreciate it! But I admit, seeing some odd results here. I think that's my specialty though ... 🤣.

I am polling my devices on a regular basis, using them for power monitoring (of some switches providing energy monitoring capability). But ... relatively often (i.e. 10+ times a day), they cause my code to "yell", for a few different reasons. So attaching my code below, as well as some of the error messages. And a few thoughts,

  1. I was going to add code to check for the recently added Error status ... but nothing is there in the case of passing results? Thinking there should always be a return status? Just to avoid try ... catch loops (they just seem nasty, LOL!)
  2. I seem to be seeing different outcomes and failures, some of them not caught (reported) by the Error status. Or am I missing it?

My code, called inside the loop, with except to let me know what kinds of failure I am seeing,

        try:
            device = tinytuya.OutletDevice(currdevice['id'], currdevice['hostname'], currdevice['key'])
            device.set_version(currdevice['ver'])
            device.set_socketPersistent(True)
            device.updatedps()
            currData = device.status()
            if currdevice['ver'] == 3.1:
                time.sleep(0.5)
            dps = currData["dps"]
            # Check for power data - DP 19 on some 3.1/3.3 devices
            W = A = V = 0
            if "19" in dps.keys():
                W = float(dps['19']) / 10.0
                A = float(dps['18']) / 1000.0
                V = float(dps['20']) / 10.0
            # Check for power data - DP 5 for some 3.1 devices
            elif "5" in dps.keys():
                W = float(dps["5"]) / 10.0
                A = float(dps["4"]) / 1000.0
                V = float(dps["6"]) / 10.0
            # Return power measurement results
            return {'Current': A, 'Power': W, 'Voltage': V}
        except OSError:
            print("[OSError] Error reading Tuya power meter, ", currdevice['hostname'])
            return None
        except ValueError:
            print("[ValueError] Error reading Tuya power meter, ", currdevice['hostname'])
            return None
        except KeyError:
            print("[KeyError] Error reading Tuya power meter, ", currdevice['hostname'])
            print("[KeyError] Data, ", currData)
            return None
        except TypeError:
            print("[TypeError] Error reading Tuya power meter, ", currdevice['hostname'])
            print("[TypeError] Data, ", currData)
            return None

And, some of the exceptions (some normal / caught, others not,

[KeyError] Error reading Tuya power meter,  emFrontSwitch [KeyError] Data,  {'Error': 'Network Error: Unable to Connect', 'Err': '901', 'Payload': None}
[TypeError] Error reading Tuya power meter,  emRussell [TypeError] Data,  None
[KeyError] Error reading Tuya power meter,  emWinServer [KeyError] Data,  {'dps': {'18': 530, '19': 642}, 't': 1617010003}
[KeyError] Error reading Tuya power meter,  emRussell [KeyError] Data,  {'Error': 'Network Error: Device Unreachable', 'Err': '905', 'Payload': None}

And this one, when using the data (somehow wrong data type?),

influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: field type conflict: input field \"value\" on measurement \"Current\" is type integer, already exists as type float dropped=3"}

Thanks!

arrmo avatar Mar 29 '21 12:03 arrmo

You can add a check for an error response from the status() call with something like this:

currData = device.status()
if currData is None or "Error" in currData:
    # something went wrong
    if "Error" in currData:
        Error = currData["Error"]
        print(Error)
    else:
        print("Null response")
    # return an error state or perhaps wait and retry?
    return {'Current': -1, 'Power': -1, 'Voltage': -1}

Some of the responses do not include all the keys (18, 19 and 20) which is probably an out of sync response from the device.updatedps() call which I have discovered only returns keys to values that actually changed since the last call. You could update your check to:

            if "18" in dps and "19" in dps and "20" in dps:
                W = float(dps['19']) / 10.0
                A = float(dps['18']) / 1000.0
                V = float(dps['20']) / 10.0

jasonacox avatar Mar 31 '21 03:03 jasonacox

call which I have discovered only returns keys to values that actually changed since the last call.

That makes sense, except ... and voltage may not change, even if current and power do - let me add a check for that. Good to know about this, thanks!

Thanks again.

arrmo avatar Apr 02 '21 02:04 arrmo

OK, updated my code to check individually for each returned value - let's see if that does it 😄.

Thanks again for the info!

arrmo avatar Apr 03 '21 00:04 arrmo

Keep in mind that the status() command should always return all of the values. The payloads you get that have only partial data can be ignored as they are likely out of sequence (delayed) responses from the updatedps() call.

jasonacox avatar Apr 03 '21 00:04 jasonacox

FYI, I am using status(), after updatedps() ... still only partial parameters. Voltage, for example, is often not there. No biggie, I worked around it.

In case you're interested (or not ... LOL!), stats from the last day (or so),

hostname=emFrontSwitch       : Failure Rate = 0.5%, OK Count = 1240, Total Count = 1246
hostname=emMacOS             : Failure Rate = 0.1%, OK Count = 1245, Total Count = 1246
hostname=emLinuxServer       : Failure Rate = 0.0%, OK Count = 1246, Total Count = 1246
hostname=emPoolTuya          : Failure Rate = 2.0%, OK Count = 1221, Total Count = 1246
hostname=emRussell           : Failure Rate = 1.8%, OK Count = 1223, Total Count = 1245
hostname=emWinServer         : Failure Rate = 1.5%, OK Count = 1227, Total Count = 1246

And, from the highest failure rate,

name: Error
tags: errMessage=ERR_CONNECT
time count
---- -----
0    1

name: Error
tags: errMessage=ERR_OFFLINE
time count
---- -----
0    14

name: Error
tags: errMessage=tuyaTypeError
time count
---- -----
0    7

Wondering if perhaps adding more retries may help?

Thanks!

arrmo avatar Apr 04 '21 19:04 arrmo

Thanks Russell. Increasing the retry may help, or adding a retry specifically in your code when you don't get the response you want. Keep in mind that the network ERROR responses you are getting occur only after tinytuya tries 5 times (you can increase the default with set_socketRetryLimit(integer)).

Do you have any other systems polling the devices at the same time? That overlap may be causing some of the network errors.

jasonacox avatar Apr 09 '21 04:04 jasonacox

you can increase the default with set_socketRetryLimit(integer)

Will give that a try, thanks!

FYI, getting very few error warnings now (as I trap most of them, log as needed). But did get this just today, figured I'd let you know.

Traceback (most recent call last):
  File "./eMeter.py", line 89, in <module>
    currData = meterTuya.emeter(currDevice)
  File "/mnt/ProgSSD/eMeter/tuyaMeter.py", line 30, in emeter
    device.updatedps()
  File "/mnt/ProgSSD/eMeter/venv/lib/python3.8/site-packages/tinytuya/__init__.py", line 880, in updatedps
    data = self._send_receive(payload,0)
  File "/mnt/ProgSSD/eMeter/venv/lib/python3.8/site-packages/tinytuya/__init__.py", line 574, in _send_receive
    msg = unpack_message(data)
  File "/mnt/ProgSSD/eMeter/venv/lib/python3.8/site-packages/tinytuya/__init__.py", line 301, in unpack_message
    _, seqno, cmd, _, retcode = struct.unpack(
struct.error: unpack requires a buffer of 20 bytes

Thanks!

arrmo avatar Apr 09 '21 23:04 arrmo

Thanks! Yes, I would love to figure out how that occured. That's an odd one:

_, seqno, cmd, _, retcode = struct.unpack(
struct.error: unpack requires a buffer of 20 bytes

The confusing thing about that error is that unpack_message() should never be called if the payload is not at least 28 bytes and retry is not True. Do you happen to have anything in your code that would do something like these:

d.retry = 5
d.retry = False

It could just be a weird GC issue in Python. What python version are you using?

jasonacox avatar Apr 10 '21 04:04 jasonacox

NP, let me know what I can do to help!

Do you happen to have anything in your code that would do something like these

I don't think so .. LOL! Here is my latest,

        try:
            device = tinytuya.OutletDevice(currdevice['id'], currdevice['hostname'], currdevice['key'])
            device.set_version(currdevice['ver'])
            device.set_socketPersistent(True)
            if 'socketTimeout' in currdevice:
                device.set_socketTimeout(currdevice['socketTimeout'])
            if 'socketRetryLimit' in currdevice:
                device.set_socketTimeout(currdevice['socketRetryLimit'])
            device.updatedps()
            currData = device.status()
            if currdevice['ver'] == 3.1:
                time.sleep(0.5)
            dps = currData["dps"]

What python version are you using?

Python 3.8.6

arrmo avatar Apr 10 '21 10:04 arrmo

FYI, I see this issue hit every day or two. Not a biggie by any means, it's sort of a percolating (and interesting) item ... 😄

arrmo avatar Apr 17 '21 15:04 arrmo

Thanks @arrmo - I appreciate the updates. I added some additional error handling for unpack_message() calls to see if we can catch it. It will be available in v1.2.4.

jasonacox avatar Apr 19 '21 05:04 jasonacox

Great, thanks! I updated, will let you know what I see the next time this occurs. Assuming I don't need to add any code to check / trap on errors, just let it keep running?

arrmo avatar Apr 19 '21 19:04 arrmo

You are correct. Let me know how it goes! Thanks, Russell.

jasonacox avatar Apr 20 '21 01:04 jasonacox

Hmmm ... nothing so far 😆. And with v1.2.4, my failure rates are down quite a bit? Did you change anything else?

hostname=emFrontSwitch       : Failure Rate = 0.7%, OK Count = 1428, Total Count = 1438
hostname=emHackintosh        : Failure Rate = 0.0%, OK Count = 1439, Total Count = 1439
hostname=emLinuxServer       : Failure Rate = 0.0%, OK Count = 1438, Total Count = 1438
hostname=emPoolTuya          : Failure Rate = 0.3%, OK Count = 1433, Total Count = 1438
hostname=emRussell           : Failure Rate = 0.0%, OK Count = 1438, Total Count = 1438
hostname=emWinServer         : Failure Rate = 0.3%, OK Count = 1434, Total Count = 1439

Thanks!

arrmo avatar Apr 24 '21 17:04 arrmo

That's great! There were a few other tweaks and bux fixes that went in, but it seems odd that it would have such a profound effect. I'll take it. Hopefully it holds. ;) Thanks for the update.

jasonacox avatar Apr 25 '21 01:04 jasonacox