mlem Parallel/sharded offline batch inference

Hi, I'm new to all things MLOps, so I apologize if this question is too general or vague...

Basically, I would like to apply one or more models to a huge amount of static data on S3, It takes too long for a single machine to process the data, so I would like to apply the model in (data) parallel — i.e., have many duplicates of the same (containerized) model, where each is applied to a different shard of the data.

Is this possible with mlem?

Nov 22 '22 02:11 nrlugg

Hi @nrlugg! No problem - the question is really good and we'd like to support that in MLEM, but right now it's not implemented. We have discussed something similar recently with @mike0sv, for batch scoring #11. Don't think we'll support it in this earlier than 2023 though, but let's see if @mike0sv have something to add.

Nov 23 '22 11:11 aguschin

We probably will not implement something like this in mlem directly, since parallel computation is another big problem. However we'll try to implement some integrations so you can have this functionality via other services, Eg via spark. I also created this issue https://github.com/iterative/mlem/issues/502, once it's done you can try to deploy your models to sagemaker and then run batch inference from there

Nov 25 '22 11:11 mike0sv

Thanks @aguschin @mike0sv! I'll just keep watching mlem development (including @mike0sv's issue) as I think the design is beautiful and would love to be able to use it :)

I realize this is out of the scope of this issue but: do either of you know of any other tools for this kind of offline batch inference?

Nov 27 '22 23:11 nrlugg