LIMES
LIMES copied to clipboard
Support Big Data Technologies
Can the current workflow deal with big datasets (i.e., when it's impossible to store them in-memory)?
Yes. See memory management package. The mapping class needs to be updated though. We need a file mapping that supports writing mappings to the hard drive.
Okay. I would keep this issue open until the new Mapping
class is updated.
How is this thing going?
Kevin and I are currently working on porting HR3 to either Flink or Spark. Though this task is certainly smaller than the scope of the original question it might be reasonable to aim for such frameworks rather than having a new Mapping
class, i.e. having a LIMES-Flink oder LIMES-Spark implementation, that can be run in a cluster.
I like the LIMES-Spark idea.
I did some research on this lately and it seems like Apache BEAM is what we'd want for a complete LIMES port to big data technology. Will be considered as part of the upcoming rewrite.