Boris Lublinsky comments

Results 88 comments of


                                            Boris Lublinsky

[Bug] Example notebook finding no input files in ededup

I do not think it is a bug. Input folder in this case is not configured correctly. Execution thinks that: ``` data factory data_ is using local data access: input_folder...

[Bug] Example notebook finding no input files in ededup

@daw3rd. Agreed, but this is not a bug. We can ask for enhancement for better error handling, but do not qualify it as a bug

minio is not able to start

According to the log, minio started correctly. The error is during copying. timeout is at `read tcp 127.0.0.1:50180`. Was the data copied to the VM?

[Bug] Issues with Ray execution on the M1 CPU

Sorry, I am not convinced. ray_client_server_23000.err is ray error not the API server. see https://github.com/ray-project/ray/issues/19792 for explanation

[Bug] Issues with Ray execution on the M1 CPU

@roytman do we still need this?

[Feature] Python support for fuzzy dedup implementation

I am sorry, but I have no idea how to do this in pure Python. It will be a completely different implementation. The current code is leveraging Ray capabilities, that...

[Feature] Python implementation for exact dedup

I am sorry, but implementation in pure Python is not going to be compatible with the current Ray implementation

[Feature] Capability to distribute during initialization to a large binary object (e.g.a table) to all the transform instances

This is already supported. See https://github.com/IBM/data-prep-kit/blob/dev/data-processing-lib/python/src/data_processing/runtime/pure_python/transform_runtime.py for Python and https://github.com/IBM/data-prep-kit/blob/dev/data-processing-lib/spark/src/data_processing_spark/runtime/spark/transform_runtime.py for Spark For Python it works exactly the way you hacked it. For Spark runtime runs for every partition -...

[Feature] Capability to specify the paths where multiple output tables will be saved

After looking carefully at this issue, there are 2 options for its resolution 1. Extend the File processor and transform API as suggested. This is a very invasive change, which...