node-tree-sitter Bug: `Parser.parse(string)` fails if the string length is `>=` than the buffer size

Bug: `Parser.parse(string)` fails if the string length is `>=` than the buffer size

Open kestred opened this issue 1 year ago • 4 comments

In the README, it suggests you can pass a string to parse:

const sourcePath = '...';
const sourceCode =  readFileSync(sourcePath, 'utf-8');
const tree = parser.parse(source);

However if you pass a string longer than the default buffer size, parsing will fail with Error: Invalid Argument

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize - 1, ' ');
const tree = parser.parse(source);  // --> OK

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize, ' ');
const tree = parser.parse(source);  // --> ERROR

[email protected]/node_modules/tree-sitter/index.js:361
    ? parse.call(
            ^

Error: Invalid argument
    at Parser.parse ([email protected]/node_modules/tree-sitter/index.js:361:13)

Possible Solutions

Some ideas to improve the behavior:

Have Parser.prototype.parse's callback for strings be limited to buffer size:

 const DEFAULT_BUFFER_SIZE = 32 * 1024;

 const inputString = input;
 const maxSliceLength = (bufferSize || DEFAULT_BUFFER_SIZE) - 1;
 input = (offset, _position) => inputString.slice(index, index + maxSliceLength);

In CallbackInput::Read, throw an error with a clearer error message if the returned value from the callback has length >= buffer size

May 11 '24 20:05 kestred

parser.parse lets you pass in a buffer size.

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize, ' ');
const tree = parser.parse(source, undefined, { bufferSize: defaultBufferSize + 1 });  // --> Not an error anymore

Might be worth putting a note somewhere about this in the readme though.

Edit: thanks @connor4312

May 15 '24 03:05 Rannytheory

We are encountering this bug as well (https://github.com/dodona-edu/dolos/issues/1544), it took a while to find the cause because of the error message. This seems to be a new problem (with the new NAPI?)

Maybe it is a good idea to improve the error message here by adding a manual check if the buffer size is enough? Or event prevent this from happening by increasing the buffer size if needed.

Jun 03 '24 11:06 rien

Hit this as well -- seems like the bufferSize needs to be input.length + 1. (Yes, length in UTF-16 code units, not bytes.) Is there any downside to always setting that to be the buffer size?

Jun 20 '24 23:06 connor4312

The buffer size should not affect the behavior - it should just control how much copying takes place. This is a really bad bug, it must have been introduced with the NAPI conversion.

Jun 21 '24 03:06 maxbrunsfeld

Trying to debug

Jul 31 '24 09:07 segevfiner

node-tree-sitter node-tree-sitter copied to clipboard

Bug: `Parser.parse(string)` fails if the string length is `>=` than the buffer size

Possible Solutions

node-tree-sitter
node-tree-sitter copied to clipboard