fix: bytes encoding error "string argument without an encoding"
Edited:
When I made some changes to the logging I incorrectly assumed that edig was always a memoryview in the processEscrow* functions and thus able to be converted to bytes with bytes(edig). This was incorrect because sometimes the edig argument is a string, so running bytes(edig) without specifying an encoding threw an error, thus creating the "string argument without an encoding" error.
This PR fixes the logging in each processEscrow* function so that each type, memoryview, bytes, or string, is appropriately converted to a printable value before being interpolated in a log message.
Do we have any example tests that trigger escrows I could start from? All the prior change did was change the way we logged things, meaning there were no unit tests prior to my prior change either.
Its not a unit test:
But I believe the following will exercise the variations that might cause failures.
import logging
def main():
logger = logging.getLogger("new")
logger.setLevel(logging.INFO)
console_handler = logging.StreamHandler()
console_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
logger.addHandler(console_handler)
try:
logger.info("Kevery unescrow error: %s", bytes("edig"))
except TypeError as e:
print("existing error", e)
# because some generators return str | bytes | memoryview | Iterable
transformations = [
lambda s: s, # string
lambda s: s.encode("utf-8"), # bytes
lambda s: memoryview(s.encode("utf-8")) # memoryview
]
edigs = []
for transform in transformations:
edig = transform("edig")
print(f"Type: {type(edig).__name__}")
edigs.append(edig)
logger.info("msg: %s", f"PDE Stale event escrow at dig = {edig}")
logger.info("msg: %s", f"PDE Stale event escrow at dig = {edigs}")
if __name__ == "__main__":
main()
The LMDB function return Python memoryview objects, which are not usable in f"" strings are in general with most operations like logging. If it is a memoryview than converting it to bytes is accomplished by bytes(mv). But formated f"" want unicode strings not bytes so to to convert to regular strings you want. bytes(mv).decode()
some of the functions do the work of converting to str by checking the type and then decoding if it needed it.
But when trying to somehow optimize the logging or simplify the logic, taking shortcuts becomes counter productive when one does not actually understand the types one is encountering. So the logging worked before. Now its broken. The main problem was it was too verbose but it was correct. So instead of just fixing the verbosity, there was an attempt to somehow make it more 'efficient' without doing the work of understanding it. Better to just do the one thing and not the other.
Sam, thank you for clarifying that the return value for edig in statements like
for ekey, edig in self.db.getOoeItemIter(key=key):
can sometimes be a memoryview. When I dug into the LMDBer.gettopItemIter function to discover the type I saw the ckey = bytes(ckey) in the below code and assumed all returns of edig would be cast to bytes like the one in the below function is.
class LMDBer(filing.Filer):
...
def getTopItemIter(self, db, top=b''):
...
with self.env.begin(db=db, write=False, buffers=True) as txn:
cursor = txn.cursor()
if cursor.set_range(top): # move to val at key >= key if any
for ckey, cval in cursor.iternext(): # get key, val at cursor
ckey = bytes(ckey) # <----------- referring to this line ---------------
if not ckey.startswith(top): # prev entry if any last in branch
break # done
yield (ckey, cval) # another entry in branch startswith key
return # done raises StopIteration
I made the assumption that bytes(edig) in the logger statement was extraneous since it appeared to always be done in the LMDBer function. It sounds like you are saying that in some instances the edig return value is a memoryview. I will go and manually check each iterator function to confirm this just to double check.
My goal wasn't to make it more efficient, I just wanted to get past the "string argument without encoding" TypeError which was confusing. It confused me because I thought that edig would always be a bytes object due to the cast I saw in LMDBer. I didn't take the time at first to look at every LMDBer function that is used in eventing.py in these escrow loops, though now I will.
From what I can tell the problem I am trying to solve occurs when the bytes(edig) function is called with something Python thinks is a string.
After reading the code just now I found out the following, which will be reflected in the PRs shortly, regarding the value for edig in the various escrow functions.
processEscrowOutOfOrders-> memoryviewprocessEscrowPartialSigs-> memoryviewprocessEscrowPartialWigs-> memoryviewprocessEscrowPartialDels-> stringprocessEscrowDelegables-> bytes, yet comes from DB as deserialized string, then cast to bytesprocessQueryNotFound-> stringprocessEscrowDuplicitous-> memoryview
I will update these escrow functions accordingly. This is the kind of deeper code analysis I should have done before making a bulk change like I did when I wrote this PR. In the future I will be more diligent in validating assumptions I make about data types in the code.
No, I was wrong again, I swapped the key and value in my mind. I'll correct the above comment. Some of the edig values are memoryview values, not bytes.
The latest commit fixes the type used in log messages by properly converting memory view types to bytes and then using .decode() to transform it into a string. For consistency I used a dig variable to get it right once and then reuse it in all relevant log statements in each processEscrow* function.
@pfeairheller I added unit tests proving that the log statements work in the OOO, PSE, PWE, and PDEL escrows. I will have the rest tested later this week. I will add the same tests to 1.1.32 and main as well.
Resolves https://github.com/WebOfTrust/keripy/issues/973
Remove dig variables and replace with inline bytes(edig).decode() calls.
Make sure strings passed to bytes() cast have encodings specified.
closing and will re-open when ready