rules_ts icon indicating copy to clipboard operation
rules_ts copied to clipboard

Worker mode should have better strategies to GC workers which are consuming a lot of memory and options for configuring the number of possible workers.

Open bhavitsharma opened this issue 2 years ago • 4 comments

tsc --watch is pretty memory heavy in general - a project with just 6 files and 5-6 dependencies in package.json ends up taking ~300 Mbs. Our company's projects are pretty massive ~ 300 files in some projects! I have seen tsc workers taking around 22 GBs of memory on our projects and sometimes they get OOMed. I tracked down the problem to this function in ts_project_worker.js :

function isNearOomZone() {
    const stat = v8.getHeapStatistics()
    const used = (100 / stat.heap_size_limit) * stat.used_heap_size
}

This check is only for the v8's used heap memory however there's no check on the used memory of the entire system. I changed the function to get around OOM:

function isNearOomZone() {
    const stat = v8.getHeapStatistics()
    const used = (100 / stat.heap_size_limit) * stat.used_heap_size
    const availableMem = os.freemem()
    const totalMem = os.totalmem()
    const ans =
        100 - used < NEAR_OOM_ZONE ||
        (100 * availableMem) / totalMem < NEAR_OOM_ZONE
    return ans
}

and implemented the following GC strategy as a workaround (GC the worker which has the least sourcefiles)

function sweepLeastRecentlyUsedWorkers() {
    while (workers.size > 0 && isNearOomZone()) {
        // Find the program with least number of source files and kill it.
        const m = Array.from(workers.entries()).map(([k, v]) => {
            return [k, v.program.getCurrentProgram().getSourceFiles().length]
        })
        m.sort((a, b) => a[1] - b[1])
        const [killKey, _] = m[0]
        fs.appendFileSync(
            'WorkerLog.txt',
            `Killing worker with key: ${killKey} at time ${Date.now()}`
        )
        workers.get(killKey).program.close()
        workers.delete(killKey)
    }
}

bhavitsharma avatar Oct 27 '22 20:10 bhavitsharma

there's been a lot of discussion about memory control for workers. bazel lacks resource control and estimation when it comes to workers.

See: https://github.com/bazelbuild/bazel/issues/10662 and https://github.com/bazelbuild/bazel/issues/12165.

the principled fix is here letting bazel control system-wide resource management. OTOH, I'd be willing to take a peek at why it takes 300Mb of ram for each target. 300Mb seems excessive. maybe I could come up with extra optimization like sharing ASTs across targets to reduce duplicate ASTs.

thesayyn avatar Oct 28 '22 15:10 thesayyn

You can do this simply by adding args to the target;

args = [
  "--generateTrace",
  "/tmp/traces"
]

thesayyn avatar Oct 28 '22 16:10 thesayyn

Hey there's definitely scope of optimization across sharing watchers and ast. I guess it requires a deeper dive in the ts compiler's code.

I saw tsc --watch (the cli command) consuming around 300 MBs on a project with 4-5 ts files and 6-7 deps in package.json

bhavitsharma avatar Oct 28 '22 19:10 bhavitsharma

sharing watchers

with the virtual fs implementation present, watchers are barely a problem.

and ast

the problem mostly arises from having to store hundreds of ASTs. now that I am thinking about it, you might even be suffering from memory leaks if you have multiple ts_project targets in the same BUILD file.

I am warm to the idea of having different GC algorithms but I am not entirely sure that would work for everyone.

thesayyn avatar Oct 29 '22 16:10 thesayyn

Due to bugs like this one, we are moving away from supporting the Persistent Worker in the next major 2.0 release of rules_ts, and likely will never fix this, sorry!

alexeagle avatar Aug 08 '23 22:08 alexeagle