csv-schema-inference icon indicating copy to clipboard operation
csv-schema-inference copied to clipboard

Allowing different multiprocessing engines

Open orellabac opened this issue 2 years ago • 4 comments

**Is your feature request related to a problem? Please describe I hace environments were I cannot use this library because i cannot leverage multiprocessing only threading.

Describe the solution you'd like Using a backed that allow multiple backends will be great

Describe alternatives you've considered Joblib

orellabac avatar Feb 01 '23 17:02 orellabac

Hi @Wittline do you think it will possible to consider this change ?

sfc-gh-mrojas avatar Apr 27 '23 12:04 sfc-gh-mrojas

Hi @sfc-gh-mrojas could you please provide more technical details? I did not see your requests before, I am not receiving notifications about new issues.

Wittline avatar Apr 27 '23 15:04 Wittline

Sure. Currently the code depends on the multiprocessing lib. The problem is that in some environments I cannot spawn new processes. I think there is PR using job lib that way the backend is configurable and it allows several scenarios. We would like to allow that. What are your thoughts?

sfc-gh-mrojas avatar Apr 27 '23 17:04 sfc-gh-mrojas

Hi @sfc-gh-mrojas @orellabac, have you had a chance to test the performance of the code? If so, could you please share some details about the performance with different sizes of CSV files?

I would be interested in knowing the processing time and memory consumption for files of varying sizes. It would also be helpful to understand if there were any particular bottlenecks or challenges that you encountered during your testing.

Any additional insights you can provide on the performance of the code would be greatly appreciated. Thank you!

Wittline avatar Apr 30 '23 04:04 Wittline