kaitai_struct_python_runtime
kaitai_struct_python_runtime copied to clipboard
Added __repr__ and a helper function
Thanks for the contribution!
Unfortunately, this code breaks many things:
- First and foremost, it fails with an error:
33
34 def __repr__(self):
---> 35 return "".join((self.__class__.__name__, "(", ", ".join(self._reprGeneratorForAllProps(self)), ")"))
36
37 def __exit__(self, *args, **kwargs):
TypeError: _reprGeneratorForAllProps() takes exactly 1 argument (2 given)
- If it was working, this code dumps only
seq
items, but it will ignore instances, as they are stored in_m_foo
like members, starting with_
, which you skip - Adding support for instances might be tricky, as you'll need to avoid problems with infinite recursion to be encountered in cases like these:
meta:
id: rec
seq:
- id: some
type: u1
instances:
self_ref:
value: _root
- Naming:
- function names must use lower underscore case
- Complexity:
- cramming 4-5-6 statements into a single line is not a good idea
- Iadding extra function in KaitaiStruct class that will be inherited everywhere is probably not a good idea as well
- Formatting:
- PEP8 requires usage of whitespace around some binary operators and recommends to use it around the others
- line length must be up 79 — you have 109 here
- docstring must end with a period
- Codacy already reported trailing whitespace
Forgive me for being outright dismissive but... is __repr__
really that much needed? @GreyCat I noticed that you and I run our forums completely differently. In Construct issues, I would have closed this PR after no more than a day of deliberation and half the arguments. You seem to keep issues open forever for everyone to see and participate. I have to admit, and very ahem deliberately, that I appreciate the way you run things. But lets close this one and move on? :thinking:
__repr__
is definitely needed. It is a pain to deal (especially do debug in ipython shell) with anything without __repr__
. Feel free to fix that PR any way you like, I don't have time to work on it for now.
I just fixed the style/pep8, didnt change how it works. Better?
I have added a guard against recursive objects.
Broadened scope to instances. @arekbulski @GreyCat @funkyfuture could you test and review it, it seems it's ready.
I dont even understand how this code works, nor why its so lengthy. Also its not providing different str and repr versions. I would suggest considering using Construct as reference:
- recursion lock only 16 lines of code https://github.com/construct/construct/blob/492b5f93679b6fb21f01558e20420efabc6dac01/construct/lib/containers.py#L29-L44
- both STR and REPR REPR produces a full one-liner that entirely reproduces the dict. STR produces newline-separated shortened items (truncated to 32 bytes) for viewing. https://github.com/construct/construct/blob/492b5f93679b6fb21f01558e20420efabc6dac01/construct/lib/containers.py#L207-L250
I appreciate the initiative, but honestly I dont like the implementation. If you would be willing to wait, then I will offer alternative implementation based on Construct.
There were some errors, I have fixed them.
@GreyCat @arekbulski I was advised with a good replacement for a lock and truncation which are already a part of python stdlib :)
reprlib is missing on python 2, but I guess may be backported and put into pip, the name is free https://pypi.python.org/pypi/reprlib
I have issues with this implementation:
- reprlib is Python 3 only but the runtime is supposed to be Python 2/3
- your code uses different indentation than rest of the script
- I dont particularly like the usage of dir, we could ask @GreyCat to change the compiler so that generated structs have something similar to
__slots__
, so you would just iterate that. - it evalutates unevaluated (lazy) instances? this could lead to failure.
reprlib is Python 3 only but the runtime is supposed to be Python 2/3
Again: we can try to backport it and put into pip.
your code uses different indentation than rest of the script
fixed. Thank you for bringing it up.
I dont particularly like the usage of dir, we could ask @GreyCat to change the compiler so that generated structs have something similar to slots, so you would just iterate that.
__slots__
cannot be properties
it evalutates unevaluated (lazy) instances? this could lead to failure.
It does, it could. But reprlib provides recursion lock and truncation, so I guess there will be no infinite recursions and rendering too much values of sequences. About reading large files into memory which can cause consuming all the memory in machine - I guess we should map them, not cache them and use references to them instead of copies (I wonder if it is possible in python), this would lead into consuming only a single page. Maybe we need a volatile
hint to optimize this.
__slots__
cannot be properties
But thats not what I said or meant. We could have the compiler generate KaitaiStruct-s that have something similar to slots, a list of names, to iterate that instead of filtering dir:
class Enum0(KaitaiStruct):
__slotsalike__ = ["pet_1","pet_2"]
def _read(self):
self.pet_1 = self._root.Animal(self._io.read_u4le())
self.pet_2 = self._root.Animal(self._io.read_u4le())
then
- for k in dir(self)
- if k[0] != "_" and not hasattr(KaitaiStruct, k) and not isinstance(getattr(self, k), type)
+ for k in __slotsalike__:
+ yield getattr(self, k)
This would not only have better performance, if you care about that in this case that is, but it might also be useful to end users as they might have a use for a list of field names.
But reprlib provides recursion lock and truncation,
That does not address the issue I meant. You could have a pointer instance that refers to outside of stream size, which will cause failure if evaluated. Its not about the size of the output.
You could have a pointer instance that refers to outside of stream size, which will cause failure if evaluated. Its not about the size of the output.
Out of scope - its responsibility of a ksy developer to have all the needed if
s.
But thats not what I said or meant. We could have the compiler generate KaitaiStruct-s that have something similar to slots, a list of names, to iterate that instead of filtering dir:
I agree. I guess we need non-volatile fields in _fields
, properties in _props
and everything possible in __slots__
. And tuples should be used, not lists.
And I guess that collection of some of this stuff can be done in runtime without any changes to compiler.
@GreyCat Could you share your position with us? How about we have the compiler attach a list of IDs to each generated Struct, similar to slots?
Compiler actually already does something similar when doing --debug
, but not for Python. I have no idea what you guys mean by "non-volatile fields in _fields
, properties in _props
and everything possible in __slots__
", i.e. what is "non-volatile fields", "properties", "everything possible".
We generate an array called something like "seq_fields", i.e.:
- https://github.com/kaitai-io/ci_targets/blob/master/compiled/ruby/debug_0.rb#L11
- https://github.com/kaitai-io/ci_targets/blob/master/compiled/java/src/io/kaitai/struct/testformats/Debug0.java#L22
Could you explain, what's wrong with current solution - i.e. is it's reflection slow, unreliable, or something else is the matter? As far as I have guessed, __slots__
is some special array in Python?
I have no idea what you guys mean by "non-volatile fields in _fields, properties in _props and everything possible in
__slots__
", i.e. what is "non-volatile fields", "properties", "everything possible".
Non-volatile fields are the fields (members in seq
) without volatile
hint. _props
is a tuple of names of everything implemented as @property
es: instances, volatile fields, etc. __slots__
is a special collection listing the names of dumb
(non-@property
members), if it is present, only the names listed in it are allowed (unless you have some magic method overloading it), __dict__
is unavailable and it is claimed that memory footprint is reduced.
is it's reflection slow, unreliable, or something else is the matter
we process strings to determine if a name to be included, it's ugly, and may be slow
One more considiration: I guess we'll have inheritance somewhen, so the thing should be compatible to it without much overhead.
SEQ_FIELDS is exactly what I am talking about. The only issue with it is that it should be included even without debug mode and available in Python.
On Kolanich behalf: non-volatile is just a synonym for cached (like instances are cached).
Hi, what's the state of this PR? I'm interest in this as well as the default string representation is not really useful at the moment <MyBinaryType object at 0xb18605b0>
(and I wanted to spare myself writing it myself for a quick prototype :wink:)
The status of this PR is that I personally use it, but in a form of a separate branch on top of master
. On large files it may cause problems, since text is generated first, then is output, we don't evaluate size of the text that will be generated before generating it.