TileDB icon indicating copy to clipboard operation
TileDB copied to clipboard

Big increase in classA operations when opening array after tiledb upgrade

Open vincentschut opened this issue 2 years ago • 3 comments

Apologies if gitlab issues is not the right place for this. But I've posted it on the forum almost 2 weeks ago now, and got not a single reply. For us, this is a serious issue, and I'd really like to get the view of the tiledb developers on it.

--- pasted text from forum topic ---

Hi, recently we upgraded tiledb-py in our application from 0.9.1 to 0.13.3, thus tiledb-embedded from 2.3 to 2.7.2). Apparently, somewhere in between something changed to how tiledb opens (dense) arrays on GCS, because after that upgrade we saw a sudden increase in cost, due to a steep increase in what google calls “class A operations”: listObject, ReadObjectMetadata, ReadObject calls.

A little testing shows that: previously (libtiledb 2.3), opening an array (also created with that version) would issue 2 GetObjectMetadata and 2 ListObjects requests. With libtiledb 2.7.2, simply opening an array (created with 2.7.2) issues 10 GetObjectMetadata and 12 ListObjects requests.

That is a serious increase.

I understand that for most users, the usage pattern is opening once, then reading a lot. Unfortunately, our implementation means many separate jobs (in k8s) need to open many different arrays (100s to 1000s). So in our case, this increase in requests means a serious increase in monthly cost (because one pays for classA requests). We already tried to mitigate this as much as possible by caching the open arrays in a single job, but it still feels like a waste of cost (and network traffic, thus latency, thus time, thus cost again).

Is there a way to avoid that simply opening an array results in this many List/Metadata requests?

Thanks, Vincent.

--- end pasted text from forum topic ---

vincentschut avatar Jul 07 '22 11:07 vincentschut

Hi @vincentschut, apologies for the delayed response -- we've implemented one improvement for this issue here: https://github.com/TileDB-Inc/TileDB/pull/3323

@KiterLuc will respond soon with more details.

ihnorton avatar Jul 07 '22 13:07 ihnorton

Hi Vincent,

Thanks for the feedback. This fix (https://github.com/TileDB-Inc/TileDB/pull/3323) that reduces the class A calls to a bare minimum will be included in the next release (2.11) scheduled for 27 July. First, it removes an initial check to see if the array directory exists that required a ListObject request. We are listing the content of that directory shortly after and can error out at that point instead. Second, there was two ListObject done inside of the array schema folder and it was very simple to reduce it to one. Finally, we added some directories in our new array directory structure (more one that in the next paragraph) and there some unnecessary ListObject requests were removed to see if those directories contained a fragment.

Also, more on the new array directory after 2.8, we changed the folder to have the following format: we added a __commits, __fragment, and __fragment_meta folder. This was done to limit the size of listings for large arrays and improve array open times on larger arrays.

Finally, have you tried consolidating the fragment metadata? This could reduce some class A operations significantly depending on how many fragments are in your array. You can consolidate the fragment metadata using the consolidate API on the array object (https://tiledb-inc-tiledb.readthedocs-hosted.com/projects/tiledb-py/en/stable/python-api.html#array): tiledb.consolidate(path, config=tiledb.Config({"sm.consolidation.mode": "fragment_meta"}))

Thanks, Luc

KiterLuc avatar Jul 07 '22 13:07 KiterLuc

Hi @vincentschut, we've released TileDB 2.10 (and now 2.11) with the change referenced above. Please let us know if this issue is not resolved!

ihnorton avatar Aug 15 '22 17:08 ihnorton

Please ping if further follow-up is needed, and we can reopen.

ihnorton avatar Dec 01 '22 13:12 ihnorton