piscina icon indicating copy to clipboard operation
piscina copied to clipboard

CPU intensive processing and writing result to files

Open jfoclpf opened this issue 3 years ago • 2 comments

Imagine I have an array with 1 million filenames: ['1.txt', '2.txt', etc., ..., '1000000.txt'] and I need to do heavy processing and then write the result in the respective file?

What would be the method to efficiently use all the cores of the CPU to spread the CPU intensive processing towards different cores among different filenames?

Normally I would use this:

const fs = require('fs')
const async = require('async')
const heavyProcessing = require('./heavyProcessing.js')

const files = ['1.txt', '2.txt', ..., '1000000.txt']

async.each(files, function (file, cb) {
  fs.writeFile(file, heavyProcessing(file), function (err) {
    if (err) cb(Error(err)); else cb()
  })
})

How can I use Piscina? Can I just run 1 million times with no concern for memory?

What do you think about this apporach?

main.js

const path = require('path');
const fs = require('async')
const Piscina = require('piscina');

const piscina = new Piscina({
  filename: path.resolve(__dirname, 'worker.js')
});


const files = ['1.txt', '2.txt', ..., '1000000.txt']

async.each(files, function (file, cb) {
  (async function() {
    try {
      const err = await piscina.run(file)
      if (err) cb(Error(err)) else cb()
    } catch(err) {
      cb(Error(err))
    }
  })();
})

worker.js

const fs = require('fs')
const heavyProcessing = require('./heavyProcessing.js')

module.exports = (file) => {
  try {
    fs.writeFileSync(file, heavyProcessing(file))
    return null
  } catch (e) {
    return Error(e)
  }
}

jfoclpf avatar Sep 01 '22 18:09 jfoclpf

just to let everyone know, that indeed this works

jfoclpf avatar Sep 02 '22 15:09 jfoclpf

I create a gist explaining everything https://gist.github.com/jfoclpf/325bb925fedf50a9cf96bd00d99e2243

jfoclpf avatar Sep 02 '22 15:09 jfoclpf