node icon indicating copy to clipboard operation
node copied to clipboard

readFile will not read files larger than 2 GiB even if buffers can be larger

Open davazp opened this issue 1 year ago • 13 comments
trafficstars

Version

v22.11.0

Platform

Darwin LAMS0127 23.6.0 Darwin Kernel Version 23.6.0: Thu Sep 12 23:36:23 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6031 arm64 arm Darwin

Subsystem

No response

What steps will reproduce the bug?

const fs = require("fs/promises");

const FILE = "test.bin";

async function main() {
  const buffer1 = Buffer.alloc(3 * 1024 * 1024 * 1024);
  await fs.writeFile(FILE, buffer1);

  const buffer2 = await fs.readFile(FILE);
  // does not reach here
  console.log(buffer2.length);
}

main();

How often does it reproduce? Is there a required condition?

It is deterministic.

What is the expected behavior? Why is that the expected behavior?

readFile should allow for files as large as the max buffer size, as according to the documentation:

RR_FS_FILE_TOO_LARGE# An attempt has been made to read a file whose size is larger than the maximum allowed size for a Buffer.

https://nodejs.org/api/errors.html#err_fs_file_too_large

In newer node versions, the maximum buffer has increased but the maximum file size is still capped at 2 GiB

In older versions (v18), the max buffer size on 64bit platforms was 4GB, but files cannot be that large either.

What do you see instead?

readFile will throw the error

RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3221225472) is greater than 2 GiB

Additional information

No response

davazp avatar Nov 15 '24 09:11 davazp

This is a documentation issue. The 2GB limit is not for the Buffer, but rather an I/O limit.

avivkeller avatar Nov 16 '24 16:11 avivkeller

@RedYetiDev @davazp can I update de documentation?

kevinuehara avatar Nov 16 '24 17:11 kevinuehara

It would probably be good to always include the current limit in the error message. We might want to look into the actual reason for the original limit. It could probably also be adjusted. I am therefore not convinced it's only about the documentation.

BridgeAR avatar Nov 17 '24 23:11 BridgeAR

The reason is explained here: https://github.com/libuv/libuv/pull/1501

targos avatar Nov 18 '24 06:11 targos

I imagined it was something like that. On Linux it also seems that the read syscall is limited to 2GiB.

However that is pretty low level. Wouldn’t be better if readFile internally make multiple read calls and populate the buffer content and then return it?

Even if we do not want to allow arbitrarily large files, they would allow us to increase the limit to something a bit more forgiving.

davazp avatar Nov 18 '24 10:11 davazp

handling multiple calls from the Node.js side SGTM.

joyeecheung avatar Nov 19 '24 18:11 joyeecheung

To read large files in chunks (greater than 2 GiB) using Node.js, you can use fs.createReadStream() to handle the file in smaller, manageable chunks rather than loading the entire file into memory. This avoids hitting the 2 GiB limit that occurs when using fs.readFile(), as it loads the entire file into memory.

const fs = require('fs');

const filePath = 'path/to/large/file'; // Specify the path to your large file const stream = fs.createReadStream(filePath, { highWaterMark: 64 * 1024 }); // 64 KB chunks

stream.on('data', (chunk) => { console.log('Received chunk:', chunk); // Process the chunk here });

stream.on('end', () => { console.log('Finished reading the file.'); });

stream.on('error', (err) => { console.error('Error reading the file:', err); });

nevilsonani avatar Nov 20 '24 05:11 nevilsonani

Thinking about this again: we should probably not encourage people to read files into memory as a whole. Using multiple reads would prevent them from switching to streams, causing memory issues. I guess we could just improve the error message to use fs.createReadStream() instead.

BridgeAR avatar Dec 09 '24 12:12 BridgeAR

If I am using readFile, I know it is returning a buffer so I expect this to use a lot of memory if a file is large. I can anticipate this. I can switch to streaming API if I know can process the file in chunks.

That is, for me, the main distinction between readFile and streaming API. Do I want to keep the file at once in memory or in chunks? Not how big the files are.

However, I wouldn't know is that readFile has an arbitrary limit of at 2GB so my program will not work with larger files, even if I have enough memory. Of course, making it more visible in the documentation /error message does help to mitigate this. And if not many people have complained yet, it probably means the limit is kind of reasonable for most use cases.

davazp avatar Dec 09 '24 17:12 davazp

Hey, I’m looking forward to being a new contributor to the Node.js project. I’d be happy to work on updating the error message and contributing to this issue.

Think that a good new message should be something like:

RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3221225472) exceeds the supported limit of 2 GiB for readFile(). Consider using createReadStream() for handling large files to avoid memory issues.

Let me know what you guys think about!

BrunoHenrique00 avatar Dec 11 '24 15:12 BrunoHenrique00

Hi Sir,

I'd like to work on resolving this issue. Please assign it to me so I can start investigating and implementing the proposed solutions.

Looking forward to contributing!

Thank you.

Mayur-Murarka avatar Jan 01 '25 08:01 Mayur-Murarka

is this issue still open? If not then please close it. I think the issue is resolved and should be closed

khalid586 avatar Apr 24 '25 07:04 khalid586

agree - closed via https://github.com/nodejs/node/pull/59050

bmuenzenmeyer avatar Oct 25 '25 11:10 bmuenzenmeyer