feathub
feathub copied to clipboard
FeatHub - A stream-batch unified feature store for real-time machine learning
Currently, we can construct a Source with timestamp_field set but not contained in the schema like the following test. We should raise an exception in Feathub instead of failing until...
When running `nyc_taxi.py`, a `.parser.out` file would be generated in the same directory. Feathub should provide configuration options to avoid generating such log files in non-debug situations.
Feathub does not check the existence of license contents in its files. If we forget to add a license to a newly added file, the maven/python compilation process can still...
Currently, some tests in feathub verify a pandas DataFrame as follows. ```python self.assertTrue(expected_df.equals(result_df) ``` Such implementation cannot work in certain cases. For example, if an element in the two DataFrames...
Currently, the framework code of FeatHub is coupled together with concrete connector plugins. This code structure affects FeatHub extensibility and improves the workload when adding support for different connectors to...
In test_sliding_window_transform._test_late_data, when the input dataframe contains 5 records, the UDF `sleep` would be invoked 15 times. Figure out why and fix this bug to reduce performance overhead.