Bucket name getting appended to minIO service name
Question
I am running iceberg in a dockerized environment and using rest catalog and storing table details as parquet file using pyarrow on a local minIO server under the bucket "iceberg-bucket".
When using the IP address, everything is going fine, i.e.,
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"iceberg_rest_catalog",
**{
"uri": "http://0.0.0.0:8228",
"s3.endpoint": "http://0.0.0.0:9033",
"py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
"s3.access-key-id": "user",
"s3.secret-access-key": "password",
"s3.region": "us-east-1"
}
)
But, when I use service names (as per the docker-compose.yaml files for both Iceberg and MinIO), I face the issue
pyiceberg.exceptions.ServerError: SdkClientException: Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.
This is how I initialize catalog with service names
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"iceberg_rest_catalog",
**{
"uri": "http://rest-catalog:8252",
"s3.endpoint": "http://iceberg-minio:9044",
"py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
"s3.access-key-id": "user",
"s3.secret-access-key": "password",
"s3.region": "us-east-1"
}
)
On further investigation, the reason this happens because it tries to access the MinIO server with the bucket name prefixed, i.e., iceberg-bucket.iceberg-minio, which should not be the case.
What does your docker-compose.yaml look like? It's likely a configuration issue. I'd suggest starting with a known working docker configuration (such as https://github.com/apache/iceberg-python/blob/main/dev/docker-compose-integration.yml#L57) and work from there
Based on docker-compose-integration.yml above, there are several minio specific settings needed
Hi @kevinjqliu The example shared is very insightful, but my issue is that I have a MinIO service serving already for other tasks as well, and I want to access this MinIO service only, rather than creating a new one. All my dockers have different compose files, but all are running on the same network, even the docker for pyiceberg as well. I am just unable to find a reason why is it prefixing the bucket name to the service name here.
As for the docker-compose.yaml file I am using:
version: "2"
services:
local-pyiceberg:
build: .
container_name: local-pyiceberg
ports:
- "8046:80"
volumes:
- /opt/local/:/opt/local
networks:
default:
external:
name: local-zone_default
@ArijitSinghEDA Something I noticed about the error message
pyiceberg.exceptions.ServerError: SdkClientException: Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.
Specifically ServerError, suggests that this is an issue with the REST server
https://github.com/search?q=repo%3Aapache%2Ficeberg-python%20ServerError&type=code
The load_catalog code above is considered the "client" code. The real issue might be with the REST server
@kevinjqliu yes, I concur that too. Like I said before, in the REST server only it prefixed the bucket name to the MinIO service name, due to which it is unable to make any connection to the MinIO server.
Likely an issue with path-style vs virtual-hosted-style s3 access https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#virtual-hosted-style-access
Maybe a s3 config on the server side https://docs.aws.amazon.com/cli/latest/topic/s3-config.html#addressing-style
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.