LIMES icon indicating copy to clipboard operation
LIMES copied to clipboard

Support Big Data Technologies

Open mommi84 opened this issue 8 years ago • 6 comments

Can the current workflow deal with big datasets (i.e., when it's impossible to store them in-memory)?

mommi84 avatar Nov 05 '15 22:11 mommi84

Yes. See memory management package. The mapping class needs to be updated though. We need a file mapping that supports writing mappings to the hard drive.

ngonga avatar Nov 06 '15 00:11 ngonga

Okay. I would keep this issue open until the new Mapping class is updated.

mommi84 avatar Nov 06 '15 11:11 mommi84

How is this thing going?

Kleanthi avatar Mar 02 '18 15:03 Kleanthi

Kevin and I are currently working on porting HR3 to either Flink or Spark. Though this task is certainly smaller than the scope of the original question it might be reasonable to aim for such frameworks rather than having a new Mapping class, i.e. having a LIMES-Flink oder LIMES-Spark implementation, that can be run in a cluster.

dobraczka avatar Mar 02 '18 15:03 dobraczka

I like the LIMES-Spark idea.

Kleanthi avatar Mar 02 '18 15:03 Kleanthi

I did some research on this lately and it seems like Apache BEAM is what we'd want for a complete LIMES port to big data technology. Will be considered as part of the upcoming rewrite.

kvndrsslr avatar Jul 02 '20 10:07 kvndrsslr