circt icon indicating copy to clipboard operation
circt copied to clipboard

[FIRRTL][FIRLexer] add bufferID parameter to FIRLexer

Open emosy opened this issue 3 years ago • 5 comments

getLineAndColumn expects an optional bufferID. getMainFileID always returns 1, so in cases where the bufferID is not 1, this can cause a crash. Remove the unnecessary call to getMainFileID and leave out the bufferID argument (getLineAndColumn will find the buffer). mainFileID and bufferID are not the same.

emosy avatar Jun 10 '22 16:06 emosy

Are you seeing this issue in the wild somehow, or is there a test we can add? It looks like the curBuffer is always the main bufffer. Could you elaborate on how these things diverge? I hope that FIRLexer::translateLocation is only ever called on a SMLoc from the file its lexing.

If you need this for some custom code, I think there could be a performance impact to infer the correct bufferID. Maybe it would make more sense to have the lexer take the buffer ID as a constructor parameter along with the source manager?

youngar avatar Jun 14 '22 00:06 youngar

Are you seeing this issue in the wild somehow, or is there a test we can add?

Not sure how to reproduce this on command with firtool, but I had copied firtool and its FIRLexer and FIRParser pretty closely for a project. I was running into a crash every so often (not every run but often enough) when I would call translateLocation on a valid SMLoc but it would crash on this assertion at line 105 of SourceMgr.cpp: assert(Ptr >= BufStart && Ptr <= Buffer->getBufferEnd()); When I added manual checks that my SMLoc was valid before calling getLineAndColumn, I could tell that my SMLoc pointer was within the buffer I used. I concluded that somehow, the buffer of the main file was not the first buffer. I never figured out how, but by making the change I'm suggesting it fixed the issue (since now the SourceMgr finds the correct buffer for me or will fail a different assert if it can't find it).

Currently, the FIRLexer assumes that any location it needs to translate is always in the first buffer (which I believe is the "main file" for SourceMgr). I think you make a good point in suggesting to store the buffer ID as a member in the lexer class.

Maybe it would make more sense to have the lexer take the buffer ID as a constructor parameter along with the source manager? I think this would make the lexer design more extensible, so this is a good idea. When adding a new file to a SourceMgr via AddNewSourceBuffer, the return value is the buffer ID that you can then pass to the lexer when constructing it. I like that this approach makes it explicit when constructing a lexer exactly which buffer to use. That way another lexer could be constructed on a different buffer within the same source manager (by some parser or other code).

I'll change my commit to reflect that with a bufferID parameter. This should also remove unnecessary calls to getMainFileID.

emosy avatar Jun 15 '22 16:06 emosy

I also added a getBufferID() method so that FIRParser could still easily make a new lexer for parseModuleBody.

importFIRFile still assumes that the "main file ID" is the correct buffer to use.

emosy avatar Jun 15 '22 17:06 emosy

Do I need to wait for a codeowner to merge or ping someone? Lattner is still listed as the only codeowner here: https://github.com/llvm/circt/blob/6c33d7cec3eb1e31ff07d0387d514c1fc9a10e30/codeowners/firrtl.json#L5

I'm not familiar with the procedure here, so I apologize for my ignorance.

emosy avatar Jun 23 '22 16:06 emosy

@seldridge requesting review

emosy avatar Jun 28 '22 15:06 emosy

Status? 2 questions:

  1. Can this be reached from the code base (hence needs a test)?
  2. Is there any objection even if not? It doesn't look like it should impact performance.

darthscsi avatar Apr 28 '23 18:04 darthscsi