psychec icon indicating copy to clipboard operation
psychec copied to clipboard

Implementing support for preprocessing large files #9

Open aytey opened this issue 5 years ago • 9 comments

This PR implements support for generating a per-process temporary file for the purposes of preprocessing; this allows cnip to be run on large files as follows:

./cnip --cc-pp /tmp/moo/b.c

Currently, master will do this:

cnip: preprocessor invocation failed

if the file to be preprocessed is large -- this will typically happen if the file is larger than getconf ARG_MAX in bytes (well, minus a few for the arguments cnip passes in).

Currently, this code is not portable to Windows as getpid is POSIX (but there is https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/getpid?view=msvc-160)

Signed-off-by: Andrew V. Jones [email protected]

aytey avatar Mar 21 '21 21:03 aytey

The patch looks good to me @andrewvaughanj , but the build failed; once it's fixed I can merge the PR.

ltcmelo avatar Mar 21 '21 23:03 ltcmelo

Oh, that's sad ... so even though pyschec is targetting C++17, and filesystem is in C++17, 18.04's GCC doesn't have enough support for it:

  • https://askubuntu.com/questions/1256440/how-to-get-libstdc-with-c17-filesystem-headers-on-ubuntu-18-bionic

😞

We could just do what some other open-source projects do and say "use $TMPDIR if set, else hard-code /tmp" -- would you be happy to this?

FYI, I didn't overlook:

  • tmpfile -- but this returns a FILE* and we need the filename (I could read from /proc/<pid>, but that's 🤮)
  • tmpnam -- marked as unsafe and gives a build-time warning
  • mkstemp -- gives both the FILE* and the path, but needs the initial template including directory, so doesn't actually help.

If you're happy with $TMPDIR || "/tmp", I'll make the change.

aytey avatar Mar 22 '21 07:03 aytey

TMPDIR + mkstemp implemented

~I DON'T ACTUALLY THINK THIS WORKS -- executeCore DOES NOT RETURN ALL OF STDOUT (but that's not a bug in this PR, but there is a bug in executeCore)~

Would help if I close the temporary file before calling the preprocessor 🤦 -- now fixed!

aytey avatar Mar 22 '21 08:03 aytey

Hi @andrewvaughanj , I see you've put some time on working around Ubuntu's filesystem fault, but I'd prefer that we, instead of dealing with raw pointers/legacy API (it's error prone, e.g., there's a leak of tmp_template in the PR upon the early exit), keep your original approach and just add the necessary "patching" to make it work in Ubuntu.

From what I quickly checked, <experimental/filesystem> with additional link flags would do the trick — and a preprocessing directive in the code, to choose the right include header.

ltcmelo avatar Mar 22 '21 11:03 ltcmelo

So, I'm fine to "work around 18.04" but does this mean you actually prefer the pid-based approach?

Personally, I consider mkstemp nicer, but then it does need to deal with a char* API -- for example, what happens if you have a multi-threaded cnip that wants to preprocess multiple files simultaneously?

Happy to do whichever you prefer :)

aytey avatar Mar 22 '21 13:03 aytey

So, I'm fine to "work around 18.04" but does this mean you actually prefer the pid-based approach?

Kind of … :slightly_smiling_face: I mean the approach with <filesystem> / <experimental/filesystem>, but that doesn't need to use pid: you could (should) use the a new file with the same base name and only replace its extension; in particular, with .i (that's the default extension for preprocessed files), similar to what cnip's Python driver does.

ltcmelo avatar Mar 22 '21 14:03 ltcmelo

Do you prefer pid over using the temporary directory + .i? Your choice :)

aytey avatar Mar 22 '21 22:03 aytey

If you ask me, I'd say the temporary directory + .i file. (But with the API.) 🙂

ltcmelo avatar Mar 22 '21 23:03 ltcmelo

Will revisit this this week + do the stuff about looking for C++17 on older Ubuntus!

aytey avatar Mar 28 '21 22:03 aytey