pymilvus
pymilvus copied to clipboard
[Bug]: `RemoteBulkWriter` doesn't work in Windows environment.
Is there an existing issue for this?
- [X] I have searched the existing issues
Describe the bug
I deploy a milvus-standalone on my laptop, and try to use RemoteBulkWriter
It writes data just under bucket root path \\1.json as following.
And I try to deploy to docker, it can write to folder /data correctly
Expected Behavior
The data should be upload to minio path such like s3://a-bucket/data/<uuid>/1.json
Steps/Code To Reproduce behavior
from pymilvus import CollectionSchema, FieldSchema, DataType
from pymilvus.bulk_writer import RemoteBulkWriter
if __name__ == "__main__":
ACCESS_KEY = "minioadmin"
SECRET_KEY = "minioadmin"
BUCKET_NAME = "a-bucket"
ENDPOINT = "localhost:9000"
conn = RemoteBulkWriter.S3ConnectParam(
endpoint=ENDPOINT,
access_key=ACCESS_KEY,
secret_key=SECRET_KEY,
bucket_name=BUCKET_NAME,
)
from pymilvus.bulk_writer import BulkFileType
schema = CollectionSchema(
fields=[
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=4096),
],
description="Test collection",
enable_dynamic_field=True,
)
writer = RemoteBulkWriter(
schema=schema,
remote_path="/data",
connect_param=conn,
file_type=BulkFileType.JSON,
)
for i in range(1000):
writer.append_row({"embedding": [1.0] * 768, "text": "hello world"})
writer.commit()
print(writer.batch_files) # [['\\1.json']]
print(writer.data_path) # \data\8a67749b-bb60-4369-bdb4-b5e5f5853a6e
print(writer.uuid) # 8a67749b-bb60-4369-bdb4-b5e5f5853a6e
Environment details
- Hardware/Softward conditions
- OS: Windows
- CPU: 13th Gen Intel(R) Core(TM) i7-1365U
- Method of installation: docker-compose, standalone
- Milvus version : 2.4.15
- Milvus configuration :
insdie docker-compose.yaml
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]
interval: 30s
timeout: 20s
retries: 3
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
ports:
- "9001:9001"
- "9000:9000"
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.4.15
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
attu:
container_name: milvus-attu
image: zilliz/attu:v2.4
environment:
MILVUS_URL: standalone:19530
ports:
- "8000:3000"
depends_on:
- "standalone"
networks:
- default
networks:
default:
name: milvus
Anything else?
I have not test this behavior on other SDK, should we add test to CI/CD stage towards windows platform ?