kaitai_struct_python_runtime icon indicating copy to clipboard operation
kaitai_struct_python_runtime copied to clipboard

Added __repr__ and a helper function

Open KOLANICH opened this issue 7 years ago • 22 comments

KOLANICH avatar Mar 15 '17 16:03 KOLANICH

Thanks for the contribution!

Unfortunately, this code breaks many things:

  • First and foremost, it fails with an error:
     33 
     34     def __repr__(self):
---> 35         return "".join((self.__class__.__name__, "(", ", ".join(self._reprGeneratorForAllProps(self)), ")"))
     36 
     37     def __exit__(self, *args, **kwargs):

TypeError: _reprGeneratorForAllProps() takes exactly 1 argument (2 given)
  • If it was working, this code dumps only seq items, but it will ignore instances, as they are stored in _m_foo like members, starting with _, which you skip
  • Adding support for instances might be tricky, as you'll need to avoid problems with infinite recursion to be encountered in cases like these:
meta:
  id: rec
seq:
  - id: some
    type: u1
instances:
  self_ref:
    value: _root
  • Naming:
  • Complexity:
    • cramming 4-5-6 statements into a single line is not a good idea
    • Iadding extra function in KaitaiStruct class that will be inherited everywhere is probably not a good idea as well
  • Formatting:

GreyCat avatar Mar 15 '17 20:03 GreyCat

Forgive me for being outright dismissive but... is __repr__ really that much needed? @GreyCat I noticed that you and I run our forums completely differently. In Construct issues, I would have closed this PR after no more than a day of deliberation and half the arguments. You seem to keep issues open forever for everyone to see and participate. I have to admit, and very ahem deliberately, that I appreciate the way you run things. But lets close this one and move on? :thinking:

arekbulski avatar Jan 19 '18 11:01 arekbulski

__repr__ is definitely needed. It is a pain to deal (especially do debug in ipython shell) with anything without __repr__. Feel free to fix that PR any way you like, I don't have time to work on it for now.

KOLANICH avatar Jan 19 '18 20:01 KOLANICH

I just fixed the style/pep8, didnt change how it works. Better?

arekbulski avatar Jan 19 '18 21:01 arekbulski

I have added a guard against recursive objects.

KOLANICH avatar Mar 20 '18 18:03 KOLANICH

Broadened scope to instances. @arekbulski @GreyCat @funkyfuture could you test and review it, it seems it's ready.

KOLANICH avatar Mar 20 '18 19:03 KOLANICH

I dont even understand how this code works, nor why its so lengthy. Also its not providing different str and repr versions. I would suggest considering using Construct as reference:

  • recursion lock only 16 lines of code https://github.com/construct/construct/blob/492b5f93679b6fb21f01558e20420efabc6dac01/construct/lib/containers.py#L29-L44
  • both STR and REPR REPR produces a full one-liner that entirely reproduces the dict. STR produces newline-separated shortened items (truncated to 32 bytes) for viewing. https://github.com/construct/construct/blob/492b5f93679b6fb21f01558e20420efabc6dac01/construct/lib/containers.py#L207-L250

arekbulski avatar Mar 21 '18 14:03 arekbulski

I appreciate the initiative, but honestly I dont like the implementation. If you would be willing to wait, then I will offer alternative implementation based on Construct.

arekbulski avatar Mar 21 '18 14:03 arekbulski

There were some errors, I have fixed them.

KOLANICH avatar Mar 21 '18 17:03 KOLANICH

@GreyCat @arekbulski I was advised with a good replacement for a lock and truncation which are already a part of python stdlib :)

KOLANICH avatar Mar 29 '18 20:03 KOLANICH

reprlib is missing on python 2, but I guess may be backported and put into pip, the name is free https://pypi.python.org/pypi/reprlib

KOLANICH avatar Mar 29 '18 20:03 KOLANICH

I have issues with this implementation:

  • reprlib is Python 3 only but the runtime is supposed to be Python 2/3
  • your code uses different indentation than rest of the script
  • I dont particularly like the usage of dir, we could ask @GreyCat to change the compiler so that generated structs have something similar to __slots__, so you would just iterate that.
  • it evalutates unevaluated (lazy) instances? this could lead to failure.

arekbulski avatar Mar 30 '18 03:03 arekbulski

reprlib is Python 3 only but the runtime is supposed to be Python 2/3

Again: we can try to backport it and put into pip.

your code uses different indentation than rest of the script

fixed. Thank you for bringing it up.

I dont particularly like the usage of dir, we could ask @GreyCat to change the compiler so that generated structs have something similar to slots, so you would just iterate that.

__slots__ cannot be properties

it evalutates unevaluated (lazy) instances? this could lead to failure.

It does, it could. But reprlib provides recursion lock and truncation, so I guess there will be no infinite recursions and rendering too much values of sequences. About reading large files into memory which can cause consuming all the memory in machine - I guess we should map them, not cache them and use references to them instead of copies (I wonder if it is possible in python), this would lead into consuming only a single page. Maybe we need a volatile hint to optimize this.

KOLANICH avatar Mar 30 '18 06:03 KOLANICH

__slots__ cannot be properties

But thats not what I said or meant. We could have the compiler generate KaitaiStruct-s that have something similar to slots, a list of names, to iterate that instead of filtering dir:

class Enum0(KaitaiStruct):
    __slotsalike__ = ["pet_1","pet_2"]
    def _read(self):
        self.pet_1 = self._root.Animal(self._io.read_u4le())
        self.pet_2 = self._root.Animal(self._io.read_u4le())

then

-        for k in dir(self)
-        if k[0] != "_" and not hasattr(KaitaiStruct, k) and not isinstance(getattr(self, k), type)
+    for k in __slotsalike__: 
+    yield getattr(self, k)

This would not only have better performance, if you care about that in this case that is, but it might also be useful to end users as they might have a use for a list of field names.

arekbulski avatar Mar 30 '18 07:03 arekbulski

But reprlib provides recursion lock and truncation,

That does not address the issue I meant. You could have a pointer instance that refers to outside of stream size, which will cause failure if evaluated. Its not about the size of the output.

arekbulski avatar Mar 30 '18 07:03 arekbulski

You could have a pointer instance that refers to outside of stream size, which will cause failure if evaluated. Its not about the size of the output.

Out of scope - its responsibility of a ksy developer to have all the needed ifs.

But thats not what I said or meant. We could have the compiler generate KaitaiStruct-s that have something similar to slots, a list of names, to iterate that instead of filtering dir:

I agree. I guess we need non-volatile fields in _fields, properties in _props and everything possible in __slots__. And tuples should be used, not lists.

And I guess that collection of some of this stuff can be done in runtime without any changes to compiler.

KOLANICH avatar Mar 30 '18 09:03 KOLANICH

@GreyCat Could you share your position with us? How about we have the compiler attach a list of IDs to each generated Struct, similar to slots?

arekbulski avatar Apr 01 '18 08:04 arekbulski

Compiler actually already does something similar when doing --debug, but not for Python. I have no idea what you guys mean by "non-volatile fields in _fields, properties in _props and everything possible in __slots__", i.e. what is "non-volatile fields", "properties", "everything possible".

We generate an array called something like "seq_fields", i.e.:

  • https://github.com/kaitai-io/ci_targets/blob/master/compiled/ruby/debug_0.rb#L11
  • https://github.com/kaitai-io/ci_targets/blob/master/compiled/java/src/io/kaitai/struct/testformats/Debug0.java#L22

Could you explain, what's wrong with current solution - i.e. is it's reflection slow, unreliable, or something else is the matter? As far as I have guessed, __slots__ is some special array in Python?

GreyCat avatar Apr 01 '18 08:04 GreyCat

I have no idea what you guys mean by "non-volatile fields in _fields, properties in _props and everything possible in __slots__", i.e. what is "non-volatile fields", "properties", "everything possible".

Non-volatile fields are the fields (members in seq) without volatile hint. _props is a tuple of names of everything implemented as @propertyes: instances, volatile fields, etc. __slots__ is a special collection listing the names of dumb (non-@property members), if it is present, only the names listed in it are allowed (unless you have some magic method overloading it), __dict__ is unavailable and it is claimed that memory footprint is reduced.

is it's reflection slow, unreliable, or something else is the matter

we process strings to determine if a name to be included, it's ugly, and may be slow

One more considiration: I guess we'll have inheritance somewhen, so the thing should be compatible to it without much overhead.

KOLANICH avatar Apr 01 '18 08:04 KOLANICH

SEQ_FIELDS is exactly what I am talking about. The only issue with it is that it should be included even without debug mode and available in Python.

On Kolanich behalf: non-volatile is just a synonym for cached (like instances are cached).

arekbulski avatar Apr 01 '18 08:04 arekbulski

Hi, what's the state of this PR? I'm interest in this as well as the default string representation is not really useful at the moment <MyBinaryType object at 0xb18605b0> (and I wanted to spare myself writing it myself for a quick prototype :wink:)

z3ntu avatar Nov 04 '20 23:11 z3ntu

The status of this PR is that I personally use it, but in a form of a separate branch on top of master. On large files it may cause problems, since text is generated first, then is output, we don't evaluate size of the text that will be generated before generating it.

KOLANICH avatar Nov 05 '20 07:11 KOLANICH