XPlane2Blender icon indicating copy to clipboard operation
XPlane2Blender copied to clipboard

249: Any string value from 2.49 may result in UnicodeDecodeError

Open tngreene opened this issue 5 years ago • 7 comments

I thought that only some Image Datablock names were haunted, but it turns out other ones can be too.

Observe this print out from colmita's example.

File "C:\Users\Ted\AppData\Roaming\Blender Foundation\Blender\2.79\scripts\addons\io_xplane2blender\xplane_249_converter\xplane_249_material_converter.py", line 515, in <lambda>
    sorted_ll_mats = sorted(filter(lambda m:m.name.startswith(new_name), bpy.data.materials), key=get_ll_index)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 9: invalid start byte

If it can truely be during the access of any datablock name that this can happen, and not just image datablocks and material datablocks, we could have a huge problem. After all, from my basic experimenting, we can't re-assaign the data with python because that counts as access!

Maybe we could do some sort of giant loop through every datablock's name, test the name, and if it fails warn the user that "the X type datablock after [the last one that worked] is haunted has a bad name and should be changed."

Maybe there will be some kind of pattern we can find, or a ctypes solution to get the name without Python (so many tears).

We can't put a UnicodeDecodeError exception around every name access!

tngreene avatar Sep 20 '19 17:09 tngreene

Found another in yusuke's example.

  File "C:\Users\Ted\AppData\Roaming\Blender Foundation\Blender\2.79\scripts\addons\io_xplane2blender\xplane_249_converter\xplane_249_manip_decoder.py", line 356, in _getmanipulator
    manipulator_dict[manipulator_type][real_ondrej_attr_key] = prop.value
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 7: invalid start byte
~

manipulator_type comes from Blender data manipulator_type = obj.game.properties['manipulator_type'].value real_ondrej_attr_key comes from our own dictionary from getManipulators()

So, basically, any string in a 2.49 file is dangerous.

tngreene avatar Sep 20 '19 17:09 tngreene

I'm curious what is going to happen on non-Windows builds. Curious = terrified.

tngreene avatar Sep 20 '19 17:09 tngreene

Steps:

Make a new 2.49 file, make a cube with the name abcdé. Conversion gives

    for search_obj in sorted(list(filter(lambda obj: obj.type == "MESH", search_objs)), key=lambda x: x.name):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 4: invalid continuation byte

Then, delete the trailing character. It converts fine.

When opening the original abcdé file in 2.79, the UI says:

location: <unknown location>:-1
Traceback (most recent call last):
  File "C:\Users\Ted\utils\Media\blender-2.79b-windows64\2.79\scripts\startup\bl_ui\space_info.py", line 72, in draw
    row.label(text=scene.statistics(), translate=False)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 80: unexpected end of data

So, it appears any text that isn't in the text editor with Unicode characters in them breaks. Or something. One confusing thing is that the names that this thing is failing on don't necessarily (to the naked eye) have Unicode characters, and I guessing some of the artists didn't have any kind of international keyboard.

tngreene avatar Sep 20 '19 17:09 tngreene

In the python editor, typing in bpy.data.objects[ then pressing to auto complete it gives this monster.

Traceback (most recent call last):
  File "C:\Users\Ted\utils\Media\blender-2.79b-windows64\2.79\scripts\modules\console_python.py", line 264, in autocomplete
    private=bpy.app.debug_python)
  File "C:\Users\Ted\utils\Media\blender-2.79b-windows64\2.79\scripts\modules\console\intellisense.py", line 129, in expand
    matches, word = complete(line, cursor, namespace, private)
  File "C:\Users\Ted\utils\Media\blender-2.79b-windows64\2.79\scripts\modules\console\intellisense.py", line 90, in complete
    matches = complete_namespace.complete(word, namespace, private)
  File "C:\Users\Ted\utils\Media\blender-2.79b-windows64\2.79\scripts\modules\console\complete_namespace.py", line 151, in complete
    base=re_incomplete_index.group(1))
  File "C:\Users\Ted\utils\Media\blender-2.79b-windows64\2.79\scripts\modules\console\complete_namespace.py", line 111, in complete_indices
    matches = ['%s[%r]' % (base, key) for key in sorted(obj.keys())]
SystemError: <built-in method keys of bpy_prop_collection object at 0x0000022BF478C870> returned a result with an error set

When the whole text is typed out by hand, bpy.data.objects["abcdé"] it gives a KeyError. When starting with a new 2.79, all works completely fine. Somehow 2.49 data is tainted (who would have guessed!)

Attempting to use the name property itself is okay. print(obj.bl_rna.properties["name"].name), which prints "Name"

From a script launched from -P or using the Text Editor

One is able to catch UnicodeDecodeError

From GUI's Console

Attempting to access the bad cube with bpy.data.objects[0] or bpy.context.object crashes Blender with Error: EXCEPTION_ACCESS_VIOLATION.

If you open the bad file, give another object the same name, it works fine.

tngreene avatar Sep 20 '19 17:09 tngreene

When the whole text is typed out by hand, bpy.data.objects["abcdé"] it gives a KeyError. When starting with a new 2.79, all works completely fine. Somehow 2.49 data is tainted (who would have guessed!)

From the GUI: Attempting to access the bad cube with bpy.data.objects[0] or bpy.context.object crashes Blender with Error: EXCEPTION_ACCESS_VIOLATION.

If you open the bad file, give another object the same name, it works fine.

tngreene avatar Sep 20 '19 17:09 tngreene

This bug is so spooky my computer literally Blue Screen'd while working on this. :ghost: I'm tempted to put off the alpha on that bad omen alone.

tngreene avatar Sep 20 '19 18:09 tngreene

There is another idea,

Test string datablock of datablock for UnicodeDecodeError, if it happens, make a copy of the datablock. Blender automatically copies everything and strips out any bad unicode characters. Then we delete the original block.

The downside is that the name will change, could conflict with other names and make the .001 thing happen, and there would be no way to figure out which names were dropped. This is tragic for bones - but fortunately datarefs are supposed to be ASCII only (I think).

The upside is that I think this will work and we can do this on an as needed basis and does not involve getting the CType for every datablock we need.

tngreene avatar Mar 13 '20 15:03 tngreene