hdlConvertor [WIP] Fix issue 166

Hello

Current draft push request is an attempt to fix : https://github.com/Nic30/hdlConvertor/issues/166

First it remove new line added in preprocessing result. One part of the fix consist to make the preprocessing grammar to consume the training new line. The Ieee1800-2017 chapter 22.4 says

Only white space or a comment may appear on the same line as the `include compiler directive.

So we can remove new line on the line containing directive.
If included file does not end by new line, next line will appear on the last line from the included file. See tests/sv_pp/expected/include_many_dir/from_subdirectory.txt

Second part consist to enrich token build by antlr4 with original file location using file_line_map data. But the result is not yet correct: current data structure does not allow to retrieve all information we need for every lines.

The CodePosition object has been update to reflect the fileName data. hdlConvertorAst would also need to be updated.

Jul 14 '24 14:07 Thomasb81

Tests are failing because hdlConvertorAst requiere an update of Codeposition object.

Jul 14 '24 14:07 Thomasb81

@Thomasb81 Can you summarize the state of this issue, optionally can you give me info about what is need to be done?

Jun 10 '25 14:06 Nic30

The CodePosition object has been update to reflect the fileName data. hdlConvertorAst would also need to be updated.

Refer to this class that need to be updated to add a fileName attribute. https://github.com/Nic30/hdlConvertorAst/blob/19e4c60dab7614cc0fbd56f5e74e770395e7df25/hdlConvertorAst/hdlAst/utils.py#L3 to do not have issue at runtime ...

Having this python object definition in a different repository seems to create dependency circle ... it is not helping to do consistent change... Actually to do such a change we need to first update hdlConvertorAst ...

I review this PR few day ago but have difficulties to remember

I think new line in preprocessed code that does not come from source code is spotted in src/verilogPreproc/out_buffer.cpp and possibly with the preprocessor lexer change ... expected in testsuite have been updated against this change.

The codePosition object is currently fill with data issue from preprocessor buffer. So we lost the origin of token which are later use in case of error or directly report to user through codePosition object.

The initial idea was to attach the origin filename to each token after preprocessing step so it serves all use-case without addition effort. Original Antlr token object : antlr4::commonToken attach only start/end line but keep the filename global. Which is an issue for verilogSV parser since your source code can be a composition of several files. Think about a macro define that contain a syntax error defined in file A being include in file B : you want to carry in the token that the faulty code is coming from file A at line where the macro is defined.

The idea would be to fix token data before they get used by the verilog/SV parser.

To do so the idea is to use our customized token hdlConvertorToken enrich with filename. So each token create by verilog/SV lexer will be used by verilog/SV parser will have reliable filename and start/end line.

The verilog/SV Lexer is customize with our hdlConvertorTokenFactory responsible to create our customized token using the fileLineMap as a first step, to only address include line change, in the lexer.

	        /*
		 * Customize Lexer with our hdlConvertorToken factory
		 * This allow us to enrich antlr4::CommonToken with filename on
		 * each token, then spread this kind of detail everywhere token are
		 * use.
		 * Dummy structure for vhdl, since no processing step exist.
		 * */
		verilog_pp::FileLineMap dummy_file_line_map;
		hdlConvertorTokenFactory factory;
		factory.setFileLineMap(&dummy_file_line_map);
		lexer.get()->setTokenFactory(factory.DEFAULT.get());

That's where I stop, a little bit lost ... filename is probably correctly propagate, but I guess line are still wrong. Probably I miss understand FileLineMap data or did not figure out how to use them...

That's why the PR is shared as is in draft in case someone want to have a look...

Jun 10 '25 22:06 Thomasb81

Also here https://github.com/Nic30/hdlConvertor/issues/134#issuecomment-681099582 you mention to use smart_prt. to reference fileName in codePosition. Current code make a copy.

Jun 10 '25 23:06 Thomasb81

...Initial though was I can re-use https://github.com/Nic30/hdlConvertor/blob/e505d4be4d530d15d1eab52101c0b0d37cf2802e/include/hdlConvertor/verilogPreproc/file_line_map.h#L19 create along out_buffer is created. But we can't...

Better to have proper and dedicated data structure, that is able to store the origin ( file/str to parse, included file or macro expansion) of each chunk of code put in the output buffer. So hdlConvertorTokenFactory::create can properly correct file, line and pos token attribute.

Jun 15 '25 15:06 Thomasb81