winflexbison icon indicating copy to clipboard operation
winflexbison copied to clipboard

Runtime error when running multiple flex processes

Open egorpugin opened this issue 3 years ago • 6 comments

Probably there is some race in code when generating _tmpnam https://github.com/lexxmark/winflexbison/blame/master/flex/src/main.c#L1051 https://github.com/lexxmark/winflexbison/blob/master/flex/src/main.c#L1051

No issues on previous release.

Two programs generate same filename, but they are not created the file yet. Now we have a race when two processes get the same name. Code probably should append pid to filename to prevent this. Actually when using pid we do not need _1 _2 _3 ... numbers magic.

[4553/7090] generate: "D:/dev/primitives/.sw/t/434382/53fafe/bdp/fb/path.ll/path.ll.cpp"
[4555/7090] generate: "D:/dev/primitives/.sw/t/434382/53fafe/bdp/fb/settings.ll/settings.ll.cpp"
[4556/7090] generate: "D:/dev/primitives/.sw/t/434382/8a01b7/bdp/fb/range.ll/range.ll.cpp"

When executing: generate: "D:/dev/primitives/.sw/t/434382/8a01b7/bdp/fb/range.ll/range.ll.cpp"
error deleting file C:\Users\egorp\AppData\Local\Temp\~flex_out_main_2

egorpugin avatar Jan 31 '22 17:01 egorpugin

I agree that this is bad. To have that code working correctly: _tmpnam should only be used directly before the file is created - because it "checks if that file exists", doing it "up front" is the most problematic part (I still consider the use of tmpnam better than creating a new folder for each invocation, which was done before).

In theory we could ship our own implementation which:

  • uses the specified directory if it is set, otherwise TMP, otherwise TEMP
  • additional uses an increasing number on each access

If this is done then the actual file io to check for the file possibly could be left out (otherwise stat the result).

... but as noted, I'd suggest to move the use of tmpnam where it should be and claim victory.

Thanks @egorpugin for the bug submission and the check, can you go even a step further and provide a PR?

GitMensch avatar Jan 31 '22 18:01 GitMensch

What about some ~flex_out_main_PID name?

egorpugin avatar Jan 31 '22 18:01 egorpugin

That would be the "own tmpnam" variant - but as noted: it should be the best option to just use tmpnam where it is intended to be.

GitMensch avatar Jan 31 '22 21:01 GitMensch

You'll still get a race there. The race is between tmpnam and fopen. Two processes can get the same name again.

name = tmpnam();
// <- race!
fopen(name)

egorpugin avatar Jan 31 '22 22:01 egorpugin

You'll still get a race there. The race is between tmpnam and fopen. Two processes can get the same name again.

That is unlikely if both calls are directly after each other (the problem you've experienced is because they are called "far away"; but depending on the tmpnam implementation possible (the one used with your environment, which I guess is a release distribution, does not use pid, others may do so). To ensure that this does not happen it is necessary to (additional to placing both functions next to each other) using a different (in this case likely self-written) implementation of tmpnam that does use the PID + increasing number and does a stat check "does this file exist - if yes, increment the number and recheck), but that's possibly overkill.

GitMensch avatar Jan 31 '22 22:01 GitMensch

I encountered this problem when trying to use this win_bison to build Wireshark in GitHub Actions. Wireshark has a CMake macro that expands to include ADD_CUSTOM_COMMAND to invoke win_bison. There are about 16 *.l files in wireshark (3.6 branch) and evidently the build system decides to build more than one concurrently.

https://github.com/objectcomputing/OpenDDS/runs/5160933295?check_suite_focus=true#step:10:2146

mitza-oci avatar Feb 11 '22 20:02 mitza-oci

I had to give up winflexbison because of this issue ...

farcas-snps avatar Jan 06 '23 16:01 farcas-snps

@farcas-snps Is it possble to use previous version for you?

egorpugin avatar Jan 06 '23 20:01 egorpugin

I had to give up winflexbison because of this issue ...

So you know use the windows binaries from MSYS2?

GitMensch avatar Jan 06 '23 21:01 GitMensch

I had to give up winflexbison because of this issue ...

So you know use the windows binaries from MSYS2?

Yes. I wanted to generate reentrant scanners and for that I needed a newer version of Flex on Windows. But in the end, I will use a mutex on my side.

farcas-snps avatar Jan 09 '23 07:01 farcas-snps

I was hitting this issue recently when I transition my project to Ninja build, I've added some debug information and saw their _tempnam-then-fopen time to be very close, this failure is more apparent on flex_temp_out_main compared to the rest - temp_file_names, meanwhile bison's m4 files rarely fails due to the close proximity of _tempname-then-fopen and when we specify a target location (cmake's WORKING_DIRECTORY) for the bison execution, the temp files are generated in their respective target locations hence clashing is less likely...

Seems like this issue is only occurring on Windows despite having almost similar flow with Linux, I'm guessing its down to their file system implementations, Windows' delete reflects immediately in the file allocation, where as Linux files are inode-based where a file will only be deleted properly when link/use count is down to zero so a double delete doesn't really matter.

jonnysoe avatar Feb 10 '23 07:02 jonnysoe

Thank you for sharing the note, so it is actually really old code (a" local" modification" which isn't upstream) that is the root for the issue. The following old commit provides the fix the issue f496d70a0baa36a05f021d2bc66351dd4905778c - if you can set FLEX_TEMP_DIR, then still with 960f82d0e37c2aabed2a7a94bc06d112906826b8 it is used.

Questions:

  • Can you setup this in your build scripts?
  • Can someone check how this is done upstream?

GitMensch avatar Feb 10 '23 07:02 GitMensch

@GitMensch , It seems that _getpid changes were removed (accidentally?) on the last sync from upstream - 960f82d I've tried setting FLEX_TEMP_DIR, it appears that Windows' _tempnam totally ignores user-specified directory and the temporary file is always at %LocalAppData%\Temp, so the issue still persist and is also the reason why all the flex instances are creating and writing to the same directory and ultimately same file occasionally due to race condition. I'm using Windows 10, this _tempnam might be an issue on newer Windows instead of all versions of Windows...

I have a quick read on upstream (not sure if there are intermediate upstreams apart from this), it seems to write directly to user's current working directory, eg. lex.yy.cc, here is the code snippet:

	if (!env.use_stdout) {
		FILE   *prev_stdout;

		if (!env.did_outfilename) {
			snprintf (outfile_path, sizeof(outfile_path), outfile_template,
				  ctrl.prefix, suffix());

			env.outfilename = outfile_path;
		}

		prev_stdout = freopen (env.outfilename, "w+", stdout);

		if (prev_stdout == NULL)
			lerr (_("could not create %s"), env.outfilename);

		outfile_created = 1;
	}

Definitions:

static const char outfile_template[] = "lex.%s.%s";
ctrl.prefix = "yy";
const char *suffix (void)
{
	const char   *suffix;

	if (is_default_backend()) {
		if (ctrl.C_plus_plus)
			suffix = "cc";
		else
			suffix = "c";
	} else {
		suffix = skel_property("M4_PROPERTY_SOURCE_SUFFIX");
	}
	
	return suffix;
}

Despite using Windows filesystem, the msys2 version is working fine, probably because every instance is isolated in its own working directory

jonnysoe avatar Feb 13 '23 09:02 jonnysoe

@GitMensch , My mistake on calling it a _tempnam bug, apparently MS documented the behavior, this function will ignore the directory if TMP environment variable was set, and it is always set on all my machine to be LocalAppData\Temp, and in all my other non-dev PCs (I can only confirm for Windows 10 and Windows 11) This means FLEX_TEMP_DIR never work, I think we should retire this variable too

jonnysoe avatar Feb 13 '23 11:02 jonnysoe

@GitMensch , I've made a fix, lemme know if you're good with a PR https://github.com/jonnysoe/winflexbison/commit/dc0d204e527bb33de225c43d5a0bc7447c2d0c6b

jonnysoe avatar Feb 28 '23 11:02 jonnysoe

@jonnysoe the code looks good to me in general and I'd like to see a PR with this. As noted in the comments to this commit we should drop the check for tmpnam == NULL. Furthermore we should not duplicate the code but place it in the common directory along with appropriate header.

Feel free to send in 3 PRs:

1 - fix of this issue (and whitespace in the files touched) 2 - adding support for vscode 3 - the rest

GitMensch avatar Feb 28 '23 12:02 GitMensch

Alright thanks @GitMensch , I'll update and prepare them

jonnysoe avatar Mar 01 '23 07:03 jonnysoe

First part is in now, have fun with the other two :-)

GitMensch avatar Mar 05 '23 20:03 GitMensch

Do I understand correctly that this isn't fixed yet for Windows? Can't the name of the temporary file derived from the full filepath of the output file?

dg0yt avatar May 21 '23 12:05 dg0yt