delta-sharing icon indicating copy to clipboard operation
delta-sharing copied to clipboard

Performance improvement

Open ckayay opened this issue 3 years ago • 1 comments

Currently 200 K rows, 100 columns can be fully loaded in 20-40 minutes using load_as_pandas python package. Besides, Filtering is not working properly (predicate hints, etc) with this option. Improving the performance is currently only possible by partitioning delta tables monthly rather than daily.

ckayay avatar May 09 '22 17:05 ckayay

@ckayay Sorry for the late response.

Have you tried to set predicateHints? What do you mean by it not working properly?

Have you tried to run optimize on the original table so it has fewer files and easier for the client to load?

linzhou-db avatar Oct 24 '22 23:10 linzhou-db