k6 icon indicating copy to clipboard operation
k6 copied to clipboard

Add a streaming-based CSV parser to k6

Open oleiade opened this issue 2 years ago • 1 comments

Users with big CSV files with a size superior to, say, 500MB, and using a large numbers of VU directly experience our issue with handling large files.

As a result, we would like for k6 to offer an alternative way to handle CSV files. Ideally, we would like it to be streaming-based, and to hold only a subset of the data at a time in memory. That way k6 memory footprint would remain sustainable for such users.

Non-final target API

import http from 'k6/http';
import { csv } from 'k6/files'

let filename = '10M-rows.csv';

// username, password, email
// pawel, test123, [email protected]
// ...

// not using the old open() api.
// let fileContents = open(filename);

let fileHandle = streamingOpenFileHandler(filename);

const csvHandler = csv.objectReader(fileHandle.stream, {
  delimiter: ',',
  consumptionStrategy: 'uniqueSequential', // VU-safe, non-repeating.
  endOfFileHandling: 'startFromBeginning', // what to do when we run out of rows
})

export default function () {
  let object = csvHandler.next() // unique row across all VUs
  object.username

  const res = http.post('http://test.k6.io/login', {
    user: object.username,
    pass: object.password
  });
}

Prerequisites

However, being able to provide such an alternative implementation of a CSV parser that would work both for open-source and cloud users is currently blocked by issues listed in "improving the handling of large files in k6".

Namely, we would first need the ability to access such files, seek through them, and stream their content without having to first decompress them on disk, and without having to load their whole content in memory first. Another prerequisites would also be the presence of an API that allows to open and read files separately too, as opposed to storing their content in memory.

oleiade avatar Mar 13 '23 12:03 oleiade