pymilvus icon indicating copy to clipboard operation
pymilvus copied to clipboard

[Bug]: `RemoteBulkWriter` doesn't work in Windows environment.

Open counter2015 opened this issue 11 months ago • 0 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Describe the bug

I deploy a milvus-standalone on my laptop, and try to use RemoteBulkWriter

It writes data just under bucket root path \\1.json as following.

image

And I try to deploy to docker, it can write to folder /data correctly

Expected Behavior

The data should be upload to minio path such like s3://a-bucket/data/<uuid>/1.json

Steps/Code To Reproduce behavior

from pymilvus import CollectionSchema, FieldSchema, DataType
from pymilvus.bulk_writer import RemoteBulkWriter

if __name__ == "__main__":
    ACCESS_KEY = "minioadmin"
    SECRET_KEY = "minioadmin"
    BUCKET_NAME = "a-bucket"
    ENDPOINT = "localhost:9000"

    conn = RemoteBulkWriter.S3ConnectParam(
        endpoint=ENDPOINT,
        access_key=ACCESS_KEY,
        secret_key=SECRET_KEY,
        bucket_name=BUCKET_NAME,
    )

    from pymilvus.bulk_writer import BulkFileType

    schema = CollectionSchema(
        fields=[
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
            FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
            FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=4096),
        ],
        description="Test collection",
        enable_dynamic_field=True,
    )

    writer = RemoteBulkWriter(
        schema=schema,
        remote_path="/data",
        connect_param=conn,
        file_type=BulkFileType.JSON,
    )

    for i in range(1000):
        writer.append_row({"embedding": [1.0] * 768, "text": "hello world"})

    writer.commit()
    print(writer.batch_files) # [['\\1.json']]
    print(writer.data_path) # \data\8a67749b-bb60-4369-bdb4-b5e5f5853a6e
    print(writer.uuid) # 8a67749b-bb60-4369-bdb4-b5e5f5853a6e

Environment details

  • Hardware/Softward conditions
    • OS: Windows
    • CPU: 13th Gen Intel(R) Core(TM) i7-1365U
  • Method of installation: docker-compose, standalone
  • Milvus version : 2.4.15
  • Milvus configuration :

insdie docker-compose.yaml

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.15
    command: ["milvus", "run", "standalone"]
    security_opt:
    - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"

  attu:
    container_name: milvus-attu
    image: zilliz/attu:v2.4
    environment:
      MILVUS_URL: standalone:19530
    ports:
      - "8000:3000"
    depends_on:
      - "standalone"
    networks:
      - default

networks:
  default:
    name: milvus

Anything else?

I have not test this behavior on other SDK, should we add test to CI/CD stage towards windows platform ?

counter2015 avatar Dec 12 '24 04:12 counter2015