gdal icon indicating copy to clipboard operation
gdal copied to clipboard

TileDB Integration with Local Minio doesn't work.

Open mohd109 opened this issue 8 months ago • 5 comments

What is the bug?

After setting Minio acces key and secret along with the endpoint, GDAL recognizes the local endpoint for some requests, but when I try to transfer to TileDB, it tries to connect to AWS servers. This is the CPL output

app-1 | /app/main.py:36: MovedIn20Warning: The declarative_base() function is now available as sqlalchemy.orm.declarative_base(). (deprecated since: 2.0) (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) app-1 | Base = declarative_base() app-1 | /app/main.py:219: DeprecationWarning: app-1 | on_event is deprecated, use lifespan event handlers instead. app-1 | app-1 | Read more about it in the app-1 | FastAPI docs for Lifespan Events. app-1 | app-1 | @app.on_event("startup") app-1 | INFO: Started server process [1] app-1 | INFO: Waiting for application startup. app-1 | INFO:main:Initializing MinIO bucket app-1 | INFO:main:MinIO bucket ready app-1 | INFO: Application startup complete. app-1 | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) app-1 | /opt/anaconda3/envs/geo-processing/lib/python3.10/site-packages/osgeo/gdal.py:311: FutureWarning: Neither gdal.UseExceptions() nor gdal.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default. app-1 | warnings.warn( app-1 | /app/main.py:352: PydanticDeprecatedSince20: The dict method is deprecated; use model_dump instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/ app-1 | **fp_params.dict() app-1 | /app/main.py:371: PydanticDeprecatedSince20: The dict method is deprecated; use model_dump instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/ app-1 | create_tiledb_array(input_path, vsi_path, tdb_params.dict()) app-1 | ------------------------------------------------ app-1 | /tmp/tmpkeida12t/sample.tif app-1 | ------------------------------------------------ app-1 | GDAL: GDALOpen(/tmp/tmpkeida12t/sample.tif, this=0x5558df85f560) succeeds as GTiff. app-1 | CPLError: Filename should be of the form /vsis3/bucket/key app-1 | HTTP: libcurl/8.13.0 OpenSSL/3.5.0 zlib/1.3.1 zstd/1.5.7 libssh2/1.11.1 nghttp2/1.64.0 app-1 | CURL_INFO_TEXT: Host minio:9000 was resolved. app-1 | CURL_INFO_TEXT: IPv6: (none) app-1 | CURL_INFO_TEXT: IPv4: 172.20.0.3 app-1 | CURL_INFO_TEXT: Trying 172.20.0.3:9000... app-1 | CURL_INFO_TEXT: Connected to minio (172.20.0.3) port 9000 app-1 | CURL_INFO_TEXT: using HTTP/1.x app-1 | CURL_INFO_HEADER_OUT: GET /testortho/ HTTP/1.1 app-1 | Host: minio:9000 app-1 | User-Agent: GDAL/3.10.3 app-1 | Accept: / app-1 | Range: bytes=0-16383 app-1 | x-amz-date: 20250420T060446Z app-1 | x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 app-1 | Authorization: AWS4-HMAC-SHA256 Credential=KxSXSvEkAuUFa1HRa74Z/20250420/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=3394a3fdabdb9c8fdb481f5ad2f693884217986ed9d818b55ae1c1ab17e200ee app-1 | app-1 | CURL_INFO_TEXT: Request completely sent off app-1 | CURL_INFO_HEADER_IN: HTTP/1.1 200 OK app-1 | CURL_INFO_HEADER_IN: Accept-Ranges: bytes app-1 | CURL_INFO_HEADER_IN: Content-Length: 235 app-1 | CURL_INFO_HEADER_IN: Content-Type: application/xml app-1 | CURL_INFO_HEADER_IN: Server: MinIO app-1 | CURL_INFO_HEADER_IN: Strict-Transport-Security: max-age=31536000; includeSubDomains app-1 | CURL_INFO_HEADER_IN: Vary: Origin app-1 | CURL_INFO_HEADER_IN: Vary: Accept-Encoding app-1 | CURL_INFO_HEADER_IN: X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8 app-1 | CURL_INFO_HEADER_IN: X-Amz-Request-Id: 1837F1A8634BDF02 app-1 | CURL_INFO_HEADER_IN: X-Content-Type-Options: nosniff app-1 | CURL_INFO_HEADER_IN: X-Ratelimit-Limit: 3844 app-1 | CURL_INFO_HEADER_IN: X-Ratelimit-Remaining: 3844 app-1 | CURL_INFO_HEADER_IN: X-Xss-Protection: 1; mode=block app-1 | CURL_INFO_HEADER_IN: Date: Sun, 20 Apr 2025 06:04:46 GMT app-1 | CURL_INFO_HEADER_IN: app-1 | CURL_INFO_TEXT: Connection #0 to host minio left intact app-1 | S3: GetFileSize(http://minio:9000/testortho/)=235 response_code=200 app-1 | GDAL: On-demand registering /opt/anaconda3/envs/geo-processing/lib/gdalplugins/gdal_TileDB.so using GDALRegister_TileDB. app-1 | CPLError: S3: Error while listing with prefix 's3://testortho/__schema/' and delimiter '/'[Error Type: 23] [HTTP Response Code: 403] [Exception: InvalidAccessKeyId] [Remote IP: 50.7.85.34] [Request ID: 83MZRMVXVTYR2Y2C] [Headers: 'content-type' = 'application/xml' 'date' = 'Sun, 20 Apr 2025 06:04:46 GMT' 'server' = 'AmazonS3' 'transfer-encoding' = 'chunked' 'x-amz-id-2' = 'y3VQzS9uYB+m7RmG0nJJV1wpnav1iD2ZEhSBJEOMNKnVWrllA8qlm7dIB94g60yEn4wkYK4sfhI=' 'x-amz-request-id' = '83MZRMVXVTYR2Y2C'] : The AWS Access Key Id you provided does not exist in our records. app-1 | GTiff: ScanDirectories() app-1 | GDAL: GDALDefaultOverviews::OverviewScan()

Steps to reproduce the issue

`

class Settings(BaseSettings): # PostGIS settings pg_host: str = "db" pg_port: int = 5432 pg_db: str = "gisdb" pg_user: str = "postgres" pg_password: str = "postgres"

# MinIO settings
minio_endpoint: str = "minio:9000"
minio_access: str = "KxSXSvEkAuUFa1HRa74Z"
minio_secret: str = "mVUjhBJCCPBAZhRCH7noqtiwk3bNxHNIcbq5387K"
minio_bucket: str = "tiledb-data"

settings = Settings() def create_tiledb_array(input_path: str, vsi_path: str, options: dict): gdal.SetConfigOption("CPL_DEBUG", "ON") gdal.SetConfigOption("CPL_LOG_ERRORS", "ON") gdal.SetConfigOption("CPL_CURL_VERBOSE", "YES") gdal.SetConfigOption("GDAL_HTTP_NETRC", "NO") gdal.SetConfigOption("AWS_ACCESS_KEY_ID", settings.minio_access) gdal.SetConfigOption("AWS_SECRET_ACCESS_KEY", settings.minio_secret) gdal.SetConfigOption("AWS_S3_ENDPOINT", settings.minio_endpoint) gdal.SetConfigOption("AWS_HTTPS", "NO") gdal.SetConfigOption("AWS_VIRTUAL_HOSTING", "FALSE") gdal.SetConfigOption("VSI_CACHE", "TRUE") gdal.SetConfigOption("GDAL_HTTP_TIMEOUT", "300") gdal.SetConfigOption('CPL_VSIS3_CREATE_DIR_OBJECT', 'YES') gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN', 'YES')

# Create a configuration object
config = dict()

# Set configuration parameters
config["vfs.s3.scheme"] = "http"
config["vfs.s3.region"] = ""
config["vfs.s3.endpoint_override"] = "minio:9000"
config["vfs.s3.use_virtual_addressing"] = "false"
config["vfs.s3.aws_access_key_id"] = settings.minio_access
config["vfs.s3.aws_secret_access_key"] = settings.minio_secret

with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.cfg') as cfg_file:
    for key, value in config.items():
        cfg_file.write(f"{key}={value}\n")
    cfg_path = cfg_file.name
    
try:
    # GDAL Translate options for TileDB
    translate_options = gdal.TranslateOptions(
        format="TileDB",
        creationOptions=[
            f"TILEDB_CONFIG={cfg_path}",
            "BLOCKXSIZE=256",
            "BLOCKYSIZE=256"
        ]
    )

    print("------------------------------------------------")
    print(input_path)
    print("------------------------------------------------")

    src_ds = gdal.Open(input_path)
    if not src_ds:
        raise RuntimeError(f"Failed to open {input_path}")

    result = gdal.Translate(vsi_path, src_ds, options=translate_options)
    if not result:
        raise RuntimeError("TileDB creation failed")
    result.FlushCache()
finally:
    os.remove(cfg_path)`

Versions and provenance

I'm using gdal-ubuntu-latest docker image which is 3.10.3

Additional context

No response

mohd109 avatar Apr 20 '25 06:04 mohd109

I have run TileDB with fastapi and sqlalchemy (I used to work at TileDB Inc) the easiest way to debug this is to start with gdal_translate.

I created a tiledb.config file with the following as contents;

vfs.s3.scheme http
vfs.s3.region
vfs.s3.endpoint_override http://127.0.0.1:9000
vfs.s3.use_virtual_addressing false

And set the following env variables;

export AWS_ACCESS_KEY_ID=gdaltesting
export AWS_SECRET_ACCESS_KEY=gdaltesting

I ran minio

docker run -p 9000:9000 -p 9001:9001 \
  quay.io/minio/minio server /data --console-address ":9001"

Any GDAL build of TileDB with AWS support should work, support for S3 is stable. If you are concerned then build TileDB and GDAL from source, the TileDB bootstrap command is;

../bootstrap --enable-s3 --prefix=/usr/local

And follow the instructions.

From there I did the following;

gdal_translate -of TileDB -co TILEDB_CONFIG=tiledb.config UTM2GTIF.TIF /vsis3/test/UTM2GTIF

And then

gdalinfo -oo TILEDB_CONFIG=tiledb.config /vsis3/test/UTM2GTIF

Both worked as expected.

Can you confirm these work for you?

From your logs it seems you are doing some ortho processing, TileDB is a good choice for raster processing in parallel particularly with GDAL.

With gdal_create or its python equivalents it is possible to initialize an array and then fill the tiles in the array from many processes at once.

normanb avatar Apr 21 '25 23:04 normanb

Hi again, Thanks a lot for your response.

Actually no, it doesn't work. In command line version that you gave, it has segmentation fault error for two TIF files that are working everywhere else. In python version that I've posted before, it has a problem finding the Minio server, in only one of internal requests, e.g. it builds the bucket, gets the bucket size with GDAL, but fails to write schema in it. While I test both with aws.config file and environmental variables, the result is exactly the same. I used gdal_translate command in the same docker image with which I've tested python version. I can zip the whole package and send it to you or post it as a google drive link here.

Image

mohd109 avatar Apr 22 '25 06:04 mohd109

Yes, I can look into this for you. Zipping the whole package is probably best. Either send me the link to my email (norman.barker<at>gmail.com) or drop it here. Any issues we can track here for visibility.

normanb avatar Apr 22 '25 14:04 normanb

Done, thanks in advance.

https://drive.google.com/file/d/1TtSkDdjkFZGSsIsOpLmKcX9gDa3KiNzo/view?usp=drivesdk

On Tue, Apr 22, 2025, 6:23 PM Norman Barker @.***> wrote:

Yes, I can look into this for you. Zipping the whole package is probably best. Either send me the link to my email (norman.barkergmail.com) or drop it here. Any issues we can track here for visibility.

— Reply to this email directly, view it on GitHub https://github.com/OSGeo/gdal/issues/12182#issuecomment-2821599042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOPGJWSMEMKWLX2ZJTJGOD22ZJU7AVCNFSM6AAAAAB3PLTHRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMRRGU4TSMBUGI . You are receiving this because you authored the thread.Message ID: @.***> normanb left a comment (OSGeo/gdal#12182) https://github.com/OSGeo/gdal/issues/12182#issuecomment-2821599042

Yes, I can look into this for you. Zipping the whole package is probably best. Either send me the link to my email (norman.barkergmail.com) or drop it here. Any issues we can track here for visibility.

— Reply to this email directly, view it on GitHub https://github.com/OSGeo/gdal/issues/12182#issuecomment-2821599042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIOPGJWSMEMKWLX2ZJTJGOD22ZJU7AVCNFSM6AAAAAB3PLTHRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMRRGU4TSMBUGI . You are receiving this because you authored the thread.Message ID: @.***>

mohd109 avatar Apr 22 '25 15:04 mohd109

Hi again, hope you're doing well. Any news on this topic?

mohd109 avatar May 13 '25 04:05 mohd109