node-tree-sitter
node-tree-sitter copied to clipboard
Bug: `Parser.parse(string)` fails if the string length is `>=` than the buffer size
In the README, it suggests you can pass a string to parse:
const sourcePath = '...';
const sourceCode = readFileSync(sourcePath, 'utf-8');
const tree = parser.parse(source);
However if you pass a string longer than the default buffer size, parsing will fail with Error: Invalid Argument
const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize - 1, ' ');
const tree = parser.parse(source); // --> OK
const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize, ' ');
const tree = parser.parse(source); // --> ERROR
[email protected]/node_modules/tree-sitter/index.js:361
? parse.call(
^
Error: Invalid argument
at Parser.parse ([email protected]/node_modules/tree-sitter/index.js:361:13)
Possible Solutions
Some ideas to improve the behavior:
- Have
Parser.prototype.parse's callback for strings be limited to buffer size:const DEFAULT_BUFFER_SIZE = 32 * 1024; const inputString = input; const maxSliceLength = (bufferSize || DEFAULT_BUFFER_SIZE) - 1; input = (offset, _position) => inputString.slice(index, index + maxSliceLength); - In
CallbackInput::Read, throw an error with a clearer error message if the returned value from the callback has length >= buffer size
parser.parse lets you pass in a buffer size.
const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize, ' ');
const tree = parser.parse(source, undefined, { bufferSize: defaultBufferSize + 1 }); // --> Not an error anymore
Might be worth putting a note somewhere about this in the readme though.
Edit: thanks @connor4312
We are encountering this bug as well (https://github.com/dodona-edu/dolos/issues/1544), it took a while to find the cause because of the error message. This seems to be a new problem (with the new NAPI?)
Maybe it is a good idea to improve the error message here by adding a manual check if the buffer size is enough? Or event prevent this from happening by increasing the buffer size if needed.
Hit this as well -- seems like the bufferSize needs to be input.length + 1. (Yes, length in UTF-16 code units, not bytes.) Is there any downside to always setting that to be the buffer size?
The buffer size should not affect the behavior - it should just control how much copying takes place. This is a really bad bug, it must have been introduced with the NAPI conversion.
Trying to debug