keripy icon indicating copy to clipboard operation
keripy copied to clipboard

fix: bytes encoding error "string argument without an encoding"

Open kentbull opened this issue 8 months ago • 9 comments

Edited: When I made some changes to the logging I incorrectly assumed that edig was always a memoryview in the processEscrow* functions and thus able to be converted to bytes with bytes(edig). This was incorrect because sometimes the edig argument is a string, so running bytes(edig) without specifying an encoding threw an error, thus creating the "string argument without an encoding" error.

This PR fixes the logging in each processEscrow* function so that each type, memoryview, bytes, or string, is appropriately converted to a printable value before being interpolated in a log message.

kentbull avatar Apr 23 '25 17:04 kentbull

Do we have any example tests that trigger escrows I could start from? All the prior change did was change the way we logged things, meaning there were no unit tests prior to my prior change either.

kentbull avatar Apr 24 '25 14:04 kentbull

Its not a unit test:

But I believe the following will exercise the variations that might cause failures.

import logging

def main():
    logger = logging.getLogger("new")
    logger.setLevel(logging.INFO)
    console_handler = logging.StreamHandler()
    console_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
    logger.addHandler(console_handler)

    try:
        logger.info("Kevery unescrow error: %s", bytes("edig"))
    except TypeError as e:
        print("existing error", e)

    # because some generators return str | bytes | memoryview | Iterable
    transformations = [
        lambda s: s,                     # string
        lambda s: s.encode("utf-8"),     # bytes
        lambda s: memoryview(s.encode("utf-8"))  # memoryview
    ]

    edigs = []
    for transform in transformations:
        edig = transform("edig")
        print(f"Type: {type(edig).__name__}")
        edigs.append(edig)
        logger.info("msg: %s", f"PDE Stale event escrow at dig = {edig}")

    logger.info("msg: %s", f"PDE Stale event escrow at dig = {edigs}")

if __name__ == "__main__":
    main()

m00sey avatar Apr 24 '25 15:04 m00sey

The LMDB function return Python memoryview objects, which are not usable in f"" strings are in general with most operations like logging. If it is a memoryview than converting it to bytes is accomplished by bytes(mv). But formated f"" want unicode strings not bytes so to to convert to regular strings you want. bytes(mv).decode()

some of the functions do the work of converting to str by checking the type and then decoding if it needed it.

But when trying to somehow optimize the logging or simplify the logic, taking shortcuts becomes counter productive when one does not actually understand the types one is encountering. So the logging worked before. Now its broken. The main problem was it was too verbose but it was correct. So instead of just fixing the verbosity, there was an attempt to somehow make it more 'efficient' without doing the work of understanding it. Better to just do the one thing and not the other.

SmithSamuelM avatar Apr 24 '25 23:04 SmithSamuelM

Sam, thank you for clarifying that the return value for edig in statements like

for ekey, edig in self.db.getOoeItemIter(key=key):

can sometimes be a memoryview. When I dug into the LMDBer.gettopItemIter function to discover the type I saw the ckey = bytes(ckey) in the below code and assumed all returns of edig would be cast to bytes like the one in the below function is.

class LMDBer(filing.Filer):
        ...
        def getTopItemIter(self, db, top=b''):
        ...
        with self.env.begin(db=db, write=False, buffers=True) as txn:
            cursor = txn.cursor()
            if cursor.set_range(top):  # move to val at key >= key if any
                for ckey, cval in cursor.iternext():  # get key, val at cursor
                    ckey = bytes(ckey)        # <----------- referring to this line ---------------
                    if not ckey.startswith(top): #  prev entry if any last in branch
                        break  # done
                    yield (ckey, cval)  # another entry in branch startswith key
            return  # done raises StopIteration

I made the assumption that bytes(edig) in the logger statement was extraneous since it appeared to always be done in the LMDBer function. It sounds like you are saying that in some instances the edig return value is a memoryview. I will go and manually check each iterator function to confirm this just to double check.

My goal wasn't to make it more efficient, I just wanted to get past the "string argument without encoding" TypeError which was confusing. It confused me because I thought that edig would always be a bytes object due to the cast I saw in LMDBer. I didn't take the time at first to look at every LMDBer function that is used in eventing.py in these escrow loops, though now I will.

From what I can tell the problem I am trying to solve occurs when the bytes(edig) function is called with something Python thinks is a string.

kentbull avatar Apr 25 '25 00:04 kentbull

After reading the code just now I found out the following, which will be reflected in the PRs shortly, regarding the value for edig in the various escrow functions.

  • processEscrowOutOfOrders -> memoryview
  • processEscrowPartialSigs -> memoryview
  • processEscrowPartialWigs -> memoryview
  • processEscrowPartialDels -> string
  • processEscrowDelegables -> bytes, yet comes from DB as deserialized string, then cast to bytes
  • processQueryNotFound -> string
  • processEscrowDuplicitous -> memoryview

I will update these escrow functions accordingly. This is the kind of deeper code analysis I should have done before making a bulk change like I did when I wrote this PR. In the future I will be more diligent in validating assumptions I make about data types in the code.

kentbull avatar Apr 25 '25 00:04 kentbull

No, I was wrong again, I swapped the key and value in my mind. I'll correct the above comment. Some of the edig values are memoryview values, not bytes.

kentbull avatar Apr 25 '25 00:04 kentbull

The latest commit fixes the type used in log messages by properly converting memory view types to bytes and then using .decode() to transform it into a string. For consistency I used a dig variable to get it right once and then reuse it in all relevant log statements in each processEscrow* function.

kentbull avatar Apr 25 '25 01:04 kentbull

@pfeairheller I added unit tests proving that the log statements work in the OOO, PSE, PWE, and PDEL escrows. I will have the rest tested later this week. I will add the same tests to 1.1.32 and main as well.

kentbull avatar Apr 29 '25 02:04 kentbull

Resolves https://github.com/WebOfTrust/keripy/issues/973

kentbull avatar Apr 29 '25 23:04 kentbull

Remove dig variables and replace with inline bytes(edig).decode() calls.

Make sure strings passed to bytes() cast have encodings specified.

kentbull avatar May 27 '25 16:05 kentbull

closing and will re-open when ready

kentbull avatar Sep 08 '25 18:09 kentbull