pyecore
pyecore copied to clipboard
Performance issue with big datasets
Hi,
we're facing a performance issue with our application based on PyEcore. After some profiling it seems that there is a big overhead when loading a timeseries into an object.
Here the generated code:
class TimeSeries(Value):
scalingFactor = EAttribute(eType=EInt)
value = EAttribute(eType=EDouble, upper=-1)
unit = EReference(containment=True)
def __init__(self, unit=None, scalingFactor=None, value=None, **kwargs):
super(TimeSeries, self).__init__(**kwargs)
if scalingFactor is not None:
self.scalingFactor = scalingFactor
if value:
self.value.extend(value)
if unit is not None:
self.unit = unit
value
will be a big list of floats (we can't pass a numpy array because of if value
but thats another minor issue).
When we're calling the extend function we're in fact triggering a logic which is adding the elements of the list one by one to an OrderedSet and also doing checks on every element.
One possible solution i was figuring out to handle this is to replace that
self.value.extend(value)
with
self.value.items = value
although I don't know if I'm missing something important on the logic behind.
Thanks
Hi @filippometacell
Indeed, as you saw, what PyEcore does in the case you add multiple elements to a collection, it checks at runtime if each element as the right type in order to ensure that the collection content will respect the structural constraints from the metamodel. Also, behind the scene, the collection tries to set opposite collections and containers.
I will check if there is a dirty trick I can use to improve performances in this case (I have an idea, but it's not very beautiful...)
EDIT> By the way, your value
collection is an OrderedSet
, meaning that the same value cannot be added twice to your collection. Is that really what you want in Geppetto?
Hi @aranega, thanks for your fast response.
That's really a good point, I was just thinking about the overhead but we're possibly losing data we definitely don't want to lose.
@filippometacell
Sorry for the delay, I had some thing to fix before I could work on the performance issue. I made an experimental modification on pyegeppetto in the experimental/no_check
branch. You can try it and tell me if the performances are better. On my bench, the perfs are better, but you never know (I only tested with 10 millions values).
Hi @aranega, thanks for the patch
I'm trying make some tests but I'm getting the following error:
File "/home/user/testnwb/dependencies/pygeppetto/pygeppetto/model/values/values.py", line 139, in no_check_eattribute_extend
self.owner.notify(Notification(new=sublist,
NameError: name 'Notification' is not defined
also made a pip install after the checkout. Any idea?
@filippometacell My bad... I forgot to add some imports when I copied the code... Sorry about that, it should be ok now in the same branch...
Hi @aranega, I ran some tests and it's working perfectly. Performances are much better and we are no more losing data in the timeseries. Will this be ported into pyecore?
@filippometacell Regarding the data losses in the series, it depends on the way the value
meta-attribute is declared in the .ecore
, for this one, I can perform the modification by hand in the .ecore
and in the code (it's only one property to set to False
). For the other, I will try to find a good way of integrated in PyEcore. At the moment, the solution I pushed just removes the container/opposite check as well as the runtime check. If I remove it for all elements, it could lead to models that does not conform to their metamodel anymore, so I should be careful.
On top of some performances improvement I will try to add, one thing I could do is to add either an annotation that you could place on some meta-attribute to manually disable some checks, or a context manager that could temporarily remove the checks.
I will try to add all of this quickly, but I need to do more profiling work on the check
method that consumes a lot of time and that could probably be improved.
Hey @aranega thanks for looking at this! Unique is shown as false in Eclipse for value, is this a problem of how PyEcore treats the default value for a property? cc @filippometacell
@tarelli outch, actually, I checked, the .ecore
is well defining the property as it should... I perhaps introduced this error manually at some point. I will propose a new pull request on master to correct it. At the same time, I will go again on each unique=false
properties to be sure that they are well set in pygeppetto.