node-tree-sitter icon indicating copy to clipboard operation
node-tree-sitter copied to clipboard

Bug: `Parser.parse(string)` fails if the string length is `>=` than the buffer size

Open kestred opened this issue 1 year ago • 4 comments

In the README, it suggests you can pass a string to parse:

const sourcePath = '...';
const sourceCode =  readFileSync(sourcePath, 'utf-8');
const tree = parser.parse(source);

However if you pass a string longer than the default buffer size, parsing will fail with Error: Invalid Argument

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize - 1, ' ');
const tree = parser.parse(source);  // --> OK

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize, ' ');
const tree = parser.parse(source);  // --> ERROR
[email protected]/node_modules/tree-sitter/index.js:361
    ? parse.call(
            ^

Error: Invalid argument
    at Parser.parse ([email protected]/node_modules/tree-sitter/index.js:361:13)

Possible Solutions

Some ideas to improve the behavior:

  • Have Parser.prototype.parse's callback for strings be limited to buffer size:
     const DEFAULT_BUFFER_SIZE = 32 * 1024;
    
     const inputString = input;
     const maxSliceLength = (bufferSize || DEFAULT_BUFFER_SIZE) - 1;
     input = (offset, _position) => inputString.slice(index, index + maxSliceLength);
    
  • In CallbackInput::Read, throw an error with a clearer error message if the returned value from the callback has length >= buffer size

kestred avatar May 11 '24 20:05 kestred

parser.parse lets you pass in a buffer size.

const defaultBufferSize = 32 * 1024;
const sourceCode = ''.padStart(defaultBufferSize, ' ');
const tree = parser.parse(source, undefined, { bufferSize: defaultBufferSize + 1 });  // --> Not an error anymore

Might be worth putting a note somewhere about this in the readme though.

Edit: thanks @connor4312

Rannytheory avatar May 15 '24 03:05 Rannytheory

We are encountering this bug as well (https://github.com/dodona-edu/dolos/issues/1544), it took a while to find the cause because of the error message. This seems to be a new problem (with the new NAPI?)

Maybe it is a good idea to improve the error message here by adding a manual check if the buffer size is enough? Or event prevent this from happening by increasing the buffer size if needed.

rien avatar Jun 03 '24 11:06 rien

Hit this as well -- seems like the bufferSize needs to be input.length + 1. (Yes, length in UTF-16 code units, not bytes.) Is there any downside to always setting that to be the buffer size?

connor4312 avatar Jun 20 '24 23:06 connor4312

The buffer size should not affect the behavior - it should just control how much copying takes place. This is a really bad bug, it must have been introduced with the NAPI conversion.

maxbrunsfeld avatar Jun 21 '24 03:06 maxbrunsfeld

Trying to debug

segevfiner avatar Jul 31 '24 09:07 segevfiner