pyecore icon indicating copy to clipboard operation
pyecore copied to clipboard

Performance issue with big datasets

Open filippomc opened this issue 5 years ago • 9 comments

Hi,

we're facing a performance issue with our application based on PyEcore. After some profiling it seems that there is a big overhead when loading a timeseries into an object.

Here the generated code:

class TimeSeries(Value):
    scalingFactor = EAttribute(eType=EInt)
    value = EAttribute(eType=EDouble, upper=-1)
    unit = EReference(containment=True)

    def __init__(self, unit=None, scalingFactor=None, value=None, **kwargs):
        super(TimeSeries, self).__init__(**kwargs)
        if scalingFactor is not None:
            self.scalingFactor = scalingFactor
        if value:
            self.value.extend(value)
        if unit is not None:
            self.unit = unit

value will be a big list of floats (we can't pass a numpy array because of if value but thats another minor issue). When we're calling the extend function we're in fact triggering a logic which is adding the elements of the list one by one to an OrderedSet and also doing checks on every element.

One possible solution i was figuring out to handle this is to replace that self.value.extend(value) with self.value.items = value although I don't know if I'm missing something important on the logic behind.

Thanks

filippomc avatar Mar 13 '19 14:03 filippomc

Hi @filippometacell

Indeed, as you saw, what PyEcore does in the case you add multiple elements to a collection, it checks at runtime if each element as the right type in order to ensure that the collection content will respect the structural constraints from the metamodel. Also, behind the scene, the collection tries to set opposite collections and containers.

I will check if there is a dirty trick I can use to improve performances in this case (I have an idea, but it's not very beautiful...)

EDIT> By the way, your value collection is an OrderedSet, meaning that the same value cannot be added twice to your collection. Is that really what you want in Geppetto?

aranega avatar Mar 13 '19 15:03 aranega

Hi @aranega, thanks for your fast response.

That's really a good point, I was just thinking about the overhead but we're possibly losing data we definitely don't want to lose.

filippomc avatar Mar 13 '19 15:03 filippomc

@filippometacell

Sorry for the delay, I had some thing to fix before I could work on the performance issue. I made an experimental modification on pyegeppetto in the experimental/no_check branch. You can try it and tell me if the performances are better. On my bench, the perfs are better, but you never know (I only tested with 10 millions values).

aranega avatar Mar 16 '19 21:03 aranega

Hi @aranega, thanks for the patch

I'm trying make some tests but I'm getting the following error:

  File "/home/user/testnwb/dependencies/pygeppetto/pygeppetto/model/values/values.py", line 139, in no_check_eattribute_extend
    self.owner.notify(Notification(new=sublist,
NameError: name 'Notification' is not defined

also made a pip install after the checkout. Any idea?

filippomc avatar Mar 18 '19 15:03 filippomc

@filippometacell My bad... I forgot to add some imports when I copied the code... Sorry about that, it should be ok now in the same branch...

aranega avatar Mar 18 '19 16:03 aranega

Hi @aranega, I ran some tests and it's working perfectly. Performances are much better and we are no more losing data in the timeseries. Will this be ported into pyecore?

filippomc avatar Mar 18 '19 19:03 filippomc

@filippometacell Regarding the data losses in the series, it depends on the way the value meta-attribute is declared in the .ecore, for this one, I can perform the modification by hand in the .ecore and in the code (it's only one property to set to False). For the other, I will try to find a good way of integrated in PyEcore. At the moment, the solution I pushed just removes the container/opposite check as well as the runtime check. If I remove it for all elements, it could lead to models that does not conform to their metamodel anymore, so I should be careful.

On top of some performances improvement I will try to add, one thing I could do is to add either an annotation that you could place on some meta-attribute to manually disable some checks, or a context manager that could temporarily remove the checks.

I will try to add all of this quickly, but I need to do more profiling work on the check method that consumes a lot of time and that could probably be improved.

aranega avatar Mar 19 '19 13:03 aranega

Hey @aranega thanks for looking at this! Unique is shown as false in Eclipse for value, is this a problem of how PyEcore treats the default value for a property? cc @filippometacell Screenshot 2019-03-19 at 12 47 08

tarelli avatar Mar 19 '19 16:03 tarelli

@tarelli outch, actually, I checked, the .ecore is well defining the property as it should... I perhaps introduced this error manually at some point. I will propose a new pull request on master to correct it. At the same time, I will go again on each unique=false properties to be sure that they are well set in pygeppetto.

aranega avatar Mar 20 '19 09:03 aranega