Reading from Azure Storage is extremely load
I have the same delta table saved both in my local storage and Azure storage, it contains around 600 file when I run a query on the local disk, it get results in like 2 second, Azure storage more like 6 minutes, is this expected ?
Generally speaking it is highly dependent on your network connection how quickly you can load the data from a table and outperforming local access via remote requests is highly unlikely.
That being said, are you using delta-rs from python or rust? Loading the table using the pyarrow datasets API as well as the fspec file system is probably as fast as it gets for now.
On the rust side I am hoping to add a load function that takes full advantage of the optimization we have at our disposal. However there are some things that need to get done first.
it is a python binding, I will try it later to run it from a vm in Azure, just to reduce network latency, thanks again for the work
it is a python binding, I will try it later to run it from a vm in Azure, just to reduce network latency, thanks again for the work
@djouallah, Did you try this? How did it go?