feathub
feathub copied to clipboard
FeatHub - A stream-batch unified feature store for real-time machine learning
The organization of test files in FlinkProcessor's tests need to be improved after Flink's resource leak problem is resolved. Blocked by Flink ticket: https://issues.apache.org/jira/browse/FLINK-30258
Add instruction to install "./python[all]" after the dependency confliction in PyFlink and PySpark is resolved.
feathub_it_test_base.py: ```python # TODO: only invoke the corresponding base class's setUpClass() # method to reduce resource consumption. @classmethod def invoke_all_base_class_setupclass(cls): for base_class in cls.__bases__: if issubclass(base_class, unittest.TestCase): base_class.setUpClass() ```
Currently, `Sink#to_json` is not used in production or test code. We need to make sure whether this method is useful. If so, we need to add test cases to verify...
The `props` parameter in `Registry#build_features` should to other place to avoid job-specific global properties affecting feature descriptors saved in Registry.
Optimize `SparkProcessor#materialize_features`'s performance by reusing intermediate results.
```python class SparkJob(ProcessorJob): """Represent a Spark job.""" def __init__( self, job_future: Future, ) -> None: super().__init__() self._job_future = job_future # TODO: Add test case to verify this method's behavior when...