powsybl-core icon indicating copy to clipboard operation
powsybl-core copied to clipboard

JSON deserializers performance improvement

Open sylvlecl opened this issue 2 years ago • 2 comments

  • Do you want to request a feature or report a bug?

Performance improvement

  • What is the current behavior?

Following merge of https://github.com/powsybl/powsybl-core/pull/2445, I performed some performance measurements with JMH to assess the impact on serialization and parsing performances.

The test case is a contingency list with around 10000 branch contingencies, defined as a DefaultContingencyList.

Before:

Benchmark                                                   (contingencyListPath)  Mode  Cnt   Score   Error  Units
ParsingBenchmark.parsing           /home/leclercsyl/tmp/branch-contingencies.json    ss  500  11,748 ± 0,469  ms/op
ParsingBenchmark.writing           /home/leclercsyl/tmp/branch-contingencies.json    ss  500  44,966 ± 0,528  ms/op

After:

Benchmark                                                   (contingencyListPath)  Mode  Cnt   Score   Error  Units
ParsingBenchmark.parsing           /home/leclercsyl/tmp/branch-contingencies.json    ss  500  20,757 ± 1,777  ms/op
ParsingBenchmark.writing           /home/leclercsyl/tmp/branch-contingencies.json    ss  500  13,827 ± 0,484  ms/op

We can see that performances of serialization have been greatly improved, while on the other side, and unexpectedly, the performances of parsing have decreased.

After some digging, it seems that jsonParser.readValueAs and ctxt.readValue have slightly different initialization paths, which can explain the difference. However, the main outcome is that what is costly, in this use case, is the resolution of the deserializers every time a new Contingency object is parsed.

When you look at jackson implementation, it actually offers some mechanism to perform this resolution only once, through the ResolvableDeserializer and ContextualDeserializer interfaces. See for example CollectionDeserializer, which contains an actual reference to the underlying deserializer for values inside the collection.

I tried to implement ResolvableDeserializer for ContingencyDeserializer in order to resolve only once the deserializer of contingency elements:

    @Override
    public void resolve(DeserializationContext ctxt) throws JsonMappingException {
        JavaType elementsType = ctxt.getConfig().constructType(new TypeReference<ArrayList<ContingencyElement>>() {
        });
        elementsDeser = super.findDeserializer(ctxt, elementsType, null);
    }

Indeed, parsing time gets greatly improved:

Benchmark                                          (contingencyListPath)  Mode  Cnt  Score   Error  Units
ParsingBenchmark.parsing  /home/leclercsyl/tmp/branch-contingencies.json    ss  500  9,276 ± 0,343  ms/op

Similar performance is achieved by implementing ContextualDeserializer:

    @Override
    public JsonDeserializer<?> createContextual(DeserializationContext ctxt, BeanProperty property) throws JsonMappingException {
        JavaType elementsType = ctxt.getConfig().constructType(new TypeReference<ArrayList<ContingencyElement>>() {
        });
        return new ContingencyDeserializer(super.findDeserializer(ctxt, elementsType, null));
    }

gives:

Benchmark                                          (contingencyListPath)  Mode  Cnt  Score   Error  Units
ParsingBenchmark.parsing  /home/leclercsyl/tmp/branch-contingencies.json    ss  500  9,169 ± 0,361  ms/op

Note: Maybe implementing ContextualDeserializer would be a better option, it creates a new instance instead of modifying the existing one.

Conclusion Conclusion is that our deserializers could be greatly improved performance-wise, by following one of the 2 schemes above. However, this will require some work to get it done for all fields of all classes !

  • What is the motivation / use case for changing the behavior?

Performance

  • Please tell us about your environment:
    • PowSyBl Version: 5.2.0-SNAPSHOT
    • OS Version: Ubuntu 20.04

sylvlecl avatar Feb 01 '23 08:02 sylvlecl

Another, simpler approach to be tested: the main bottleneck in current implementation seems to be that CollectionDeserializer is not cacheable. We could replace its use by a raw parser-based implementation of list deserialization, in JsonUtil.readList, which would have the advantage of benefiting all our custom deserializers.

sylvlecl avatar Feb 02 '23 07:02 sylvlecl

See current benchmark code on branch parsing-benchmark, for further testing.

sylvlecl avatar Feb 02 '23 07:02 sylvlecl