mt_metadata icon indicating copy to clipboard operation
mt_metadata copied to clipboard

Make validation more efficient

Open kujaku11 opened this issue 2 years ago • 3 comments

There is a decent amount of overhead when validating metadata objects from a dictionary or other input. Might think about ways to optimize this process, maybe through multithreading or some sort of other parallelization.

Might consider using developed packages like jsonschema or pydantic for the validation.

kujaku11 avatar Mar 05 '23 19:03 kujaku11

Looks like pydantic is the logical way to go since it is a mature package and widely used.

Instead of loading in the attribute json each time the code should be delivered with all the classes defined with attributes. The JSON should be used to build class objects in development. This might help with issue PR #251. So need to build some functionality for reading in the JSON files to create a class object, or add to an existing one.

All validation can be done by pydantic so that should be quicker.

This will be a major rewrite and will take some time, but probably better in the long run.

@gabelepoudre do you have any suggestions?

kujaku11 avatar Mar 21 '25 18:03 kujaku11

@kujaku11 I don't have any experience with pydantic unfortunately, but from what I have heard from yourself and @kkappler it sounds like a completely reasonable fit.

I imagine some of the pain will be related to my last point in this comment, such that you may not be able to faithfully massage the data in the same way as before, which may mean the underlying json may need to change.

I don't know if pydantic handles lazy loading of class objects, but that would be nice if it is done at runtime. If it is a static process, (i.e. outputs .py contents), even better

Looking forward to it, even if it's a slow process! #251 should hopefully ease the pain until then :)

gabelepoudre avatar Mar 21 '25 20:03 gabelepoudre

@gabelepoudre Thanks, I was thinking of removing the read JSON part from the objects and just setting up each class as a static object. I was thinking that the read from JSON could be moved to a development only step. For instance, if you come up with a new object, you'd create the JSON, run some function that translates that to a class object with all the correct attributes and types and you do that once in development, then on deployment all the user gets are "static" objects that are then validated using Pydandic. Though this remove some of the flexibility with development of it should be faster in the end, and the validators should be faster as well, and Pydantic supports much more types than the basics currently supported. I'll keep you in the loop. This will probably take a while.

kujaku11 avatar Mar 21 '25 21:03 kujaku11