delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Reading from Azure Storage is extremely load

Open djouallah opened this issue 3 years ago • 3 comments

I have the same delta table saved both in my local storage and Azure storage, it contains around 600 file when I run a query on the local disk, it get results in like 2 second, Azure storage more like 6 minutes, is this expected ?

djouallah avatar Jul 10 '22 03:07 djouallah

Generally speaking it is highly dependent on your network connection how quickly you can load the data from a table and outperforming local access via remote requests is highly unlikely.

That being said, are you using delta-rs from python or rust? Loading the table using the pyarrow datasets API as well as the fspec file system is probably as fast as it gets for now.

On the rust side I am hoping to add a load function that takes full advantage of the optimization we have at our disposal. However there are some things that need to get done first.

roeap avatar Jul 10 '22 12:07 roeap

it is a python binding, I will try it later to run it from a vm in Azure, just to reduce network latency, thanks again for the work

djouallah avatar Jul 11 '22 02:07 djouallah

it is a python binding, I will try it later to run it from a vm in Azure, just to reduce network latency, thanks again for the work

@djouallah, Did you try this? How did it go?

yusuf-jkhan1 avatar Aug 07 '22 22:08 yusuf-jkhan1