asammdf
asammdf copied to clipboard
Converting to dataframe converts values to byte strings
Python version
('python=3.9.12 (tags/v3.9.12:b28265d, Mar 23 2022, 23:52:46) [MSC v.1929 64 '
'bit (AMD64)]')
'os=Windows-10-10.0.19044-SP0'
'numpy=1.23.3'
'asammdf=7.1.1'
Code
MDF version
4.00
Code snippet
I have an MF4 file and when I convert it to dataframe using the following code
mdf_file.to_dataframe(channels, raw=True,
time_as_date=True,
empty_channels="zeros",
reduce_memory_usage=True,
ignore_value2text_conversions=False)
i get the following result:
The values are normal integers.
However, when I switch to raw=False, as such:
mdf_file.to_dataframe(channels, raw=False,
time_as_date=True,
empty_channels="zeros",
reduce_memory_usage=True,
ignore_value2text_conversions=False)
I get the following result:

Traceback
There's no traceback
Description
I'm not sure I can send the data file, even if scrambled.
The problem seems to be the conversion from raw values to interpreted values. Somewhere along the line the values are converted from integers to byte strings that actually hold a float value (like maybe an extra str() was applied or something).
I can go over all values and attempt to convert them to float (not all of them may be actual floats, some values may actually be real strings) but that takes a looong time.
Debugging this a bit i can see that the signals that this happens for have their conversion_type configured for 7 - v4c.CONVERSION_TYPE_TABX.
For example this one:

I believe this was "working" in asammdf 5.x. I'm not sure working is the correct term, as it was probably a bug in that version, but the 5.x version had floats for the actual floats and strings for the rest. Version 7 has strings for everything, but the strings may actually be a float value.
Well, I have found the code where the problem occurs.
Shouldn't it be trying to convert to float first, and if that fails convert to bytes ? The conversion to bytes will be successful a lot more times than the conversion to float, no ?
Switching the conversions around, i.e. trying to convert to float first and if that fails try bytes I get the correct (expected) result.

I fails to see how changing the casting order can change the result. Can you send a file for analysis?
Anything can be converted to strings, so it never tries to convert to float. I'm not sure about sending a file, I will need to check and get back to you.
As explained in the pull request we can also use the same check as on line 3400, which, from what i can tell, check if all values can be converted to a single type.
Alternatively I see that another method is used further up in the codebase, starting on line 3400:
all_bytes = True
for v in ret.tolist():
if not isinstance(v, bytes):
all_bytes = False
break
if not all_bytes:
try:
ret = ret.astype("f8")
except:
if not as_object:
ret = np.array(
[np.nan if isinstance(v, bytes) else v for v in ret]
)
else:
ret = ret.astype(bytes)
We could use this check to see if we need to convert to float or bytes, but I wouldn't iterate over all items if we can help it.
@eblis please try release 7.3.16