lab icon indicating copy to clipboard operation
lab copied to clipboard

feat!: orjson instead of simplejson to load and save JSON objects

Open Martin1887 opened this issue 1 year ago • 5 comments

Hello.

This pull request replaces simplejson by orjson.

The previous JSON files generated by simplejson are compatible with orjson and, indeed, they seem equal to me, so no breaking changes in this aspect.

However, the API or orjson is not compatible with the Python native json library, so orjson is required to use Lab after this change meanwhile simplejson was optional.

In a performance perspective, orjson is around 4x faster (without SWAP usage) but it uses more RAM (around the double of RAM, not a problem in the most use cases). More benchmarks would be needed in your side before merging this pull request however.

Thanks.

Martin1887 avatar Jul 25 '24 17:07 Martin1887

Great, thanks! I'll test when I find the time. Maybe you can try fixing the tests in the meantime.

jendrikseipp avatar Jul 25 '24 19:07 jendrikseipp

Ah, I see now that you already measured a 4x speedup.

jendrikseipp avatar Dec 19 '24 21:12 jendrikseipp

Thanks! Code looks good now. I tested it locally: for a 500 MiB properties file, simplejson takes 2s to read it, while orjson takes 1.9s. Do you have the logs for a properties file where the switch to orjson pays off more? How much more memory is used in that case?

jendrikseipp avatar Dec 21 '24 21:12 jendrikseipp

I don't have any logs, but the properties file inside additive/reports.tar.gz of the following link should reproduce this behaviour:

https://zenodo.org/records/13378665/files/experiments_scripts_and_results.zip?download=1

I experienced 4x time speedup with 2x RAM consumption.

These properties files have many lists of numbers, maybe orjson is much better in this type of data.

Also, disabling sorting and formatting the speedup is higher, but properties files are not human-readable. Maybe, a parameter to disable sorting and formatting in fetchers would be a good idea for large experiments where the properties files will not be manually revised anyway.

Martin1887 avatar Dec 22 '24 10:12 Martin1887

Thanks! I'll look into this after the break.

jendrikseipp avatar Dec 22 '24 20:12 jendrikseipp