nebula-importer icon indicating copy to clipboard operation
nebula-importer copied to clipboard

Read multiple file(s) at a time when wildcard in file path

Open porscheme opened this issue 2 years ago • 1 comments

@wey-gu

Using below config file...

  • When multiple CSV data files are located at ./students/*.CSV path, Importer is trying to read all the file(s) at once
  • Each CSV data file in 4 GB in size
  • Why not read one file at a time?

Thanks in advance

version: v2
description: example
removeTempFiles: false
clientSettings:
  retry: 3
  concurrency: 1 # number of graph clients
  channelBufferSize: 1
  space: StudentCentral
  connection:
    user: root
    password: nebula
    address: rp-nebula-graphd-svc:9669
  postStart:
    commands: |
      DROP SPACE IF EXISTS StudentCentral;    
      CREATE SPACE IF NOT EXISTS StudentCentral(partition_num=6, replica_factor=2, vid_type=FIXED_STRING(80));
      USE StudentCentral;
      CREATE TAG IF NOT EXISTS                      Student(sudentId string, hcs string, docInstance string);
maritalStatusId int, raceIds string);
    afterPeriod: 8s
logPath: /csv_data/err/test.log
files:
  - path: ./students/*.CSV
    batchSize: 10000
    inOrder: false
    type: csv
    csv:
      withHeader: false
      withLabel: false
      delimiter: ","
    schema:
      type: vertex
      vertex:
        vid:
          type: string
          index: 0
        tags:
          - name: Patient
            props:
              - name: sudentId
                type: string
              - name: hcs
                type: string
              - name: docInstance
                type: string

porscheme avatar Mar 31 '22 20:03 porscheme

Sorry for the late response, didn't manage to clean my notifications in mailbox.

Yes, this should be done in an on-demand way to yield each file in a separate fashion instead of loading them in RAM in one go.

wey-gu avatar Apr 21 '22 06:04 wey-gu