dsbulk Disable log directory?

Is there a way to disable logging to a directory? I have a "load" use case where I would like to be able to run dsbulk programmatically from a python process, and as it stands I need to specify an execution ID and then remove the log directory after it runs.

It would be helpful to be able to disable logging to a directory so there is nothing left behind to clean up.

Thanks

┆Issue is synchronized with this Jira Task by Unito

Oct 14 '20 14:10 bobh66

Hi, that could be a nice enhancement indeed. But just curious: if there is an error, how would you investigate the cause if you can't see the log files?

Oct 14 '20 15:10 adutra

Ideally the error information from mapping-errors.log would print to stderr as logged ERROR messages so that the consuming program can redirect it to it's own logging as needed. The content of operations.log should go to stdout for the same reason (maybe it already does).

I'm also seeing empty files left behind in the "home" directory with the names of the tables that are being loaded. It's not a huge problem since they are empty, but it seems like the program should clean them up?

Thanks

Oct 14 '20 17:10 bobh66

Ideally the error information from mapping-errors.log would print to stderr as logged ERROR messages so that the consuming program can redirect it to it's own logging as needed.

That would be an option, but DSBulk creates many similar files for different kinds of errors. It would be a bit challenging to redirect everything to stderr (garbled contents).

I'm also seeing empty files left behind in the "home" directory with the names of the tables that are being loaded.

Now that's a first. Could you please give me a simple reproduction case? This is definitely not normal, DSBulk should not write to the home directory at all.

Oct 15 '20 09:10 adutra

I haven't been able to reproduce the empty files issue so it may have been related to some intermittent problems I was having with the process getting killed by out of memory errors. Now that I have that resolved I'm not seeing any files left behind.

One unrelated question - the project description mentions "2-4x faster" than other bulk tools, is there any way to know what that should translate into in real numbers? I'm seeing between 1000-2000 rows/sec and I don't know if that's slow or fast? I imagine it's related to my Cassandra cluster performance.

Thanks

Oct 16 '20 20:10 bobh66