Implementing support for preprocessing large files #9
This PR implements support for generating a per-process temporary file for the purposes of preprocessing; this allows cnip to be run on large files as follows:
./cnip --cc-pp /tmp/moo/b.c
Currently, master will do this:
cnip: preprocessor invocation failed
if the file to be preprocessed is large -- this will typically happen if the file is larger than getconf ARG_MAX in bytes (well, minus a few for the arguments cnip passes in).
Currently, this code is not portable to Windows as getpid is POSIX (but there is https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/getpid?view=msvc-160)
Signed-off-by: Andrew V. Jones [email protected]
The patch looks good to me @andrewvaughanj , but the build failed; once it's fixed I can merge the PR.
Oh, that's sad ... so even though pyschec is targetting C++17, and filesystem is in C++17, 18.04's GCC doesn't have enough support for it:
- https://askubuntu.com/questions/1256440/how-to-get-libstdc-with-c17-filesystem-headers-on-ubuntu-18-bionic
😞
We could just do what some other open-source projects do and say "use $TMPDIR if set, else hard-code /tmp" -- would you be happy to this?
FYI, I didn't overlook:
tmpfile-- but this returns aFILE*and we need the filename (I could read from/proc/<pid>, but that's 🤮)tmpnam-- marked as unsafe and gives a build-time warningmkstemp-- gives both theFILE*and the path, but needs the initial template including directory, so doesn't actually help.
If you're happy with $TMPDIR || "/tmp", I'll make the change.
TMPDIR + mkstemp implemented
~I DON'T ACTUALLY THINK THIS WORKS -- executeCore DOES NOT RETURN ALL OF STDOUT (but that's not a bug in this PR, but there is a bug in executeCore)~
Would help if I close the temporary file before calling the preprocessor 🤦 -- now fixed!
Hi @andrewvaughanj , I see you've put some time on working around Ubuntu's filesystem fault, but I'd prefer that we, instead of dealing with raw pointers/legacy API (it's error prone, e.g., there's a leak of tmp_template in the PR upon the early exit), keep your original approach and just add the necessary "patching" to make it work in Ubuntu.
From what I quickly checked, <experimental/filesystem> with additional link flags would do the trick — and a preprocessing directive in the code, to choose the right include header.
So, I'm fine to "work around 18.04" but does this mean you actually prefer the pid-based approach?
Personally, I consider mkstemp nicer, but then it does need to deal with a char* API -- for example, what happens if you have a multi-threaded cnip that wants to preprocess multiple files simultaneously?
Happy to do whichever you prefer :)
So, I'm fine to "work around 18.04" but does this mean you actually prefer the pid-based approach?
Kind of … :slightly_smiling_face: I mean the approach with <filesystem> / <experimental/filesystem>, but that doesn't need to use pid: you could (should) use the a new file with the same base name and only replace its extension; in particular, with .i (that's the default extension for preprocessed files), similar to what cnip's Python driver does.
Do you prefer pid over using the temporary directory + .i? Your choice :)
If you ask me, I'd say the temporary directory + .i file. (But with the
Will revisit this this week + do the stuff about looking for C++17 on older Ubuntus!