TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

Support for openstack s3?

Open mangecoeur opened this issue 1 year ago • 9 comments

I was wondering if tiledb s3 backend should also work with openstack Swift s3 compatible api? I had it working for a bit but it seems to have stopped working for me and I can't figure out if it's due to a change in tiledb or something else.

mangecoeur avatar Aug 10 '22 16:08 mangecoeur

Hi @mangecoeur.

We have not made any changes to the S3 backend recently, so this is unlikely due to any changes in TileDB. Can you give us some more info about the issue you're seeing such as any error messages and when was the last time you were able to get it to work?

Thanks.

nguyenv avatar Aug 10 '22 18:08 nguyenv

Adding a note that we do test against Minio S3 implementation regularly, and we have tested against other S3-compatible endpoints in the past (or are aware of active users). But please let us know what you are seeing and we'll take a look.

ihnorton avatar Aug 11 '22 13:08 ihnorton

So the error I'm getting is:

Failed to read S3 object s3://tessa/clusters_distances/distances.tiledb/__meta/__1660148217721_1660148217721_e642f1cfaaad472bb074a15d6c1ef99a.vac
Exception:  
Error message:

The exception and error message are blank. If I use VFS I can correctly list the files. The dataset was created locally then uploaded to S3, I don't know if that might be the cause? The dataset works fine when accessed locally.

mangecoeur avatar Aug 11 '22 13:08 mangecoeur

Deleting the .vac file manually from the __meta folder seems to have cured s3 reading, although I'm getting intermittent segfaults but it seems only with tiledb installed via pip on macosx while the conda-forge version is fine.

mangecoeur avatar Aug 13 '22 13:08 mangecoeur

Thanks for the update! Just for my clarification, does this sequence of events seem correct? At first you were able to use OpenStack S3 for TileDB where at one point you wrote metadata to the array. Then you consolidated your array metadata with "sm.consolidation.mode": "array_meta" -- I'm assuming this because you have *vac files in __meta. Was the array metadata vacuumed afterwards? After consolidation or vacuuming, you started getting the Failed to read S3 object error?

In regards to the intermittent segfaults, can you provide a reproducible example?

nguyenv avatar Aug 15 '22 19:08 nguyenv

Yes that was the sequence. I ran both consolidation and vaccum but after vaccuming on the .vac files still existed. Deleting them manually seemed to fix s3.

I'm still getting segfaults but I can't figure out what is conditions are causing them. I literally just rant the same test script 5 times and the 3rd attempt worked, all the other segfaulted. This seems to be specifically related to the pypi version of tiledb, on the same machine using a conda environment I don't get this problem.

The segfault happens in the code, which extracts multiple square slices from a 2D sparse distance matrix

        with tiledb.SparseArray(self._distances_array_file, 'r', ctx=self.tiledb_ctx) as distances
            for s in slices:
                data = distances[s, s]

The segfault message:

sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)'

mangecoeur avatar Aug 17 '22 08:08 mangecoeur

Hi @mangecoeur, it looks like that error message comes from deep inside libc in some malloc-related routine.

  • can you confirm succeeds against an identical array stored locally?
  • are you able to share an array which reproduces this issue?

Thanks

ihnorton avatar Aug 17 '22 13:08 ihnorton

So I found that I have the same issue with a local array and it seems to relate to the PyPi version of tiledb. Running the exact same setup I get a segfault when using a python venv (created using hatch) but it ~~works fine when using the version installed with conda~~ actually on a new conda env I have the problem again.

It might be something to do with how dependencies are resolved by pip within hatch that is causing some incompatible combination of packages to be installed.

I can't make public the array I'm working on and it's a bit big to share (about 14GB), but if you want I could dump some of the array metadata if you let me know what you need.

mangecoeur avatar Aug 22 '22 15:08 mangecoeur

Did some more exploring, managed to get this error message from gdb:

0x00007ffef57b3a5e in std::_Function_handler<tiledb::common::Status (unsigned long, unsigned long), tiledb::sm::parallel_for<tiledb::sm::SparseGlobalOrderReader<unsigned char>::compute_hilbert_values(std::vector<tiledb::sm::ResultTile*, std::allocator<tiledb::sm::ResultTile*> >&)::{lambda(unsigned long)#1}>(tiledb::common::ThreadPool*, unsigned long, unsigned long, tiledb::sm::SparseGlobalOrderReader<unsigned char>::compute_hilbert_values(std::vector<tiledb::sm::ResultTile*, std::allocator<tiledb::sm::ResultTile*> >&)::{lambda(unsigned long)#1} const&)::{lambda(unsigned long, unsigned long)#1}>::_M_invoke(std::_Any_data const&, unsigned long&&, std::_Any_data const) ()

from ~/.conda/envs/tessa-1/lib/python3.10/site-packages/tiledb/../../../libtiledb.so.2.11

mangecoeur avatar Aug 24 '22 12:08 mangecoeur

Happy to take a further look if there's any way to reproduce. We've fixed a number of issues in this code path, please let us know if this is not resolved in newer TileDB versions.

ihnorton avatar Mar 20 '23 01:03 ihnorton