HoloClean-Legacy-deprecated issues

Clean attributes should contribute the probability in the tensor.

All the attributes in the database should be used to compute the probability and insert into the feature. In cooccurrencefeaturizer.py, get_query function, line 91, only attribute in dirty_cells_attributes are recorded...

zaqthss

Inconsistency of domain id in database and the one in feature tensor

If the detected dirty value is null, then in file pruning.py, function _find_dk_domain(), line 202, the candidate of such cell will contain "()". In _create_dataframe() function, line 504 suggests that...

zaqthss

Update instructions for Spark on macOS

1

Setting the `SPARK_HOME` env variable on the latest version of Spark (2.3.1) will cause Spark to crash in a jupyter notebook setting.

richardwu

Whitespace in denial constraints leads to DCFormatException

When specifying a denial constraint, adding whitespace after the comma like so `t1.EQ(t1.petal_width, t1.sepal_width)` leads to this error `DCFormatException: Tuple name t1 not defined in EQ(t1.petal_width, t1.sepal_width)` and the DC...

pmaetzig

Tutorials show that load data may make blank values to "null" values

Following the tutorials, there are some problems in loading data on blank values, i.e., "" in the csv dataset. After "session.load_data(data_path)", the original blank values will be shown as "null"...

zaqthss

ENH Load CSV files from different dialects

1

This PR exposes some parameters of pyspark's CSV reader: sep, escape and multiLine, up to `session.load_data`. This enables to load files in various dialects of CSV.

moreymat

bug

Urgent

Removed setup.py

1

Removed the setup.py in the root directory. I'm not sure how it works and there are no instructions on the README. Either deleting it with this PR or someone who...

j48zheng

HoloClean-Legacy-deprecated
HoloClean-Legacy-deprecated copied to clipboard

Metadata

Clean attributes should contribute the probability in the tensor.

Inconsistency of domain id in database and the one in feature tensor

Update instructions for Spark on macOS

Whitespace in denial constraints leads to DCFormatException

Tutorials show that load data may make blank values to "null" values

ENH Load CSV files from different dialects

Fixing 'setup.py'

Upload Holoclean to Python Packaging Index

Problem in table creating in dataengine

Removed setup.py

← Metadata

Owner

Metadata

HoloClean-Legacy-deprecated HoloClean-Legacy-deprecated copied to clipboard

Metadata

← Metadata

Owner

Metadata

HoloClean-Legacy-deprecated
HoloClean-Legacy-deprecated copied to clipboard