node
node copied to clipboard
readFile will not read files larger than 2 GiB even if buffers can be larger
Version
v22.11.0
Platform
Darwin LAMS0127 23.6.0 Darwin Kernel Version 23.6.0: Thu Sep 12 23:36:23 PDT 2024; root:xnu-10063.141.1.701.1~1/RELEASE_ARM64_T6031 arm64 arm Darwin
Subsystem
No response
What steps will reproduce the bug?
const fs = require("fs/promises");
const FILE = "test.bin";
async function main() {
const buffer1 = Buffer.alloc(3 * 1024 * 1024 * 1024);
await fs.writeFile(FILE, buffer1);
const buffer2 = await fs.readFile(FILE);
// does not reach here
console.log(buffer2.length);
}
main();
How often does it reproduce? Is there a required condition?
It is deterministic.
What is the expected behavior? Why is that the expected behavior?
readFile should allow for files as large as the max buffer size, as according to the documentation:
RR_FS_FILE_TOO_LARGE# An attempt has been made to read a file whose size is larger than the maximum allowed size for a Buffer.
https://nodejs.org/api/errors.html#err_fs_file_too_large
In newer node versions, the maximum buffer has increased but the maximum file size is still capped at 2 GiB
In older versions (v18), the max buffer size on 64bit platforms was 4GB, but files cannot be that large either.
What do you see instead?
readFile will throw the error
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3221225472) is greater than 2 GiB
Additional information
No response
This is a documentation issue. The 2GB limit is not for the Buffer, but rather an I/O limit.
@RedYetiDev @davazp can I update de documentation?
It would probably be good to always include the current limit in the error message. We might want to look into the actual reason for the original limit. It could probably also be adjusted. I am therefore not convinced it's only about the documentation.
The reason is explained here: https://github.com/libuv/libuv/pull/1501
I imagined it was something like that. On Linux it also seems that the read syscall is limited to 2GiB.
However that is pretty low level. Wouldn’t be better if readFile internally make multiple read calls and populate the buffer content and then return it?
Even if we do not want to allow arbitrarily large files, they would allow us to increase the limit to something a bit more forgiving.
handling multiple calls from the Node.js side SGTM.
To read large files in chunks (greater than 2 GiB) using Node.js, you can use fs.createReadStream() to handle the file in smaller, manageable chunks rather than loading the entire file into memory. This avoids hitting the 2 GiB limit that occurs when using fs.readFile(), as it loads the entire file into memory.
const fs = require('fs');
const filePath = 'path/to/large/file'; // Specify the path to your large file const stream = fs.createReadStream(filePath, { highWaterMark: 64 * 1024 }); // 64 KB chunks
stream.on('data', (chunk) => { console.log('Received chunk:', chunk); // Process the chunk here });
stream.on('end', () => { console.log('Finished reading the file.'); });
stream.on('error', (err) => { console.error('Error reading the file:', err); });
Thinking about this again: we should probably not encourage people to read files into memory as a whole. Using multiple reads would prevent them from switching to streams, causing memory issues.
I guess we could just improve the error message to use fs.createReadStream() instead.
If I am using readFile, I know it is returning a buffer so I expect this to use a lot of memory if a file is large. I can anticipate this. I can switch to streaming API if I know can process the file in chunks.
That is, for me, the main distinction between readFile and streaming API. Do I want to keep the file at once in memory or in chunks? Not how big the files are.
However, I wouldn't know is that readFile has an arbitrary limit of at 2GB so my program will not work with larger files, even if I have enough memory. Of course, making it more visible in the documentation /error message does help to mitigate this. And if not many people have complained yet, it probably means the limit is kind of reasonable for most use cases.
Hey, I’m looking forward to being a new contributor to the Node.js project. I’d be happy to work on updating the error message and contributing to this issue.
Think that a good new message should be something like:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3221225472) exceeds the supported limit of 2 GiB for readFile(). Consider using createReadStream() for handling large files to avoid memory issues.
Let me know what you guys think about!
Hi Sir,
I'd like to work on resolving this issue. Please assign it to me so I can start investigating and implementing the proposed solutions.
Looking forward to contributing!
Thank you.
is this issue still open? If not then please close it. I think the issue is resolved and should be closed
agree - closed via https://github.com/nodejs/node/pull/59050