HoloClean-Legacy-deprecated
HoloClean-Legacy-deprecated copied to clipboard
A Machine Learning System for Data Enrichment.
All the attributes in the database should be used to compute the probability and insert into the feature. In cooccurrencefeaturizer.py, get_query function, line 91, only attribute in dirty_cells_attributes are recorded...
If the detected dirty value is null, then in file pruning.py, function _find_dk_domain(), line 202, the candidate of such cell will contain "()". In _create_dataframe() function, line 504 suggests that...
Setting the `SPARK_HOME` env variable on the latest version of Spark (2.3.1) will cause Spark to crash in a jupyter notebook setting.
When specifying a denial constraint, adding whitespace after the comma like so `t1.EQ(t1.petal_width, t1.sepal_width)` leads to this error `DCFormatException: Tuple name t1 not defined in EQ(t1.petal_width, t1.sepal_width)` and the DC...
Following the tutorials, there are some problems in loading data on blank values, i.e., "" in the csv dataset. After "session.load_data(data_path)", the original blank values will be shown as "null"...
This PR exposes some parameters of pyspark's CSV reader: sep, escape and multiLine, up to `session.load_data`. This enables to load files in various dialects of CSV.
In the 'setup.py', the packages mentioned with the specific names As the latest updates still work with Holoclean and it still is in developing mode, we can pass the setuptools...
We need to add holoclean to PPI so people can install it using `pip install holoclean`
If we have '-', '/', '.' or other special characters in columns name, because dataframe_to_table method don't put them in comma, we have a problem in creating a table in...
Removed the setup.py in the root directory. I'm not sure how it works and there are no instructions on the README. Either deleting it with this PR or someone who...