clang-include-graph icon indicating copy to clipboard operation
clang-include-graph copied to clipboard

JSON printer

Open KyleFromKitware opened this issue 1 year ago • 5 comments

clang-include-graph should have a JSON printer. This would enable external tooling to get the include graph of a translation unit in a machine-readable format.

My use case for this is that we would like to add a link-what-you-include tool to CMake, and we need to know what files were included by a translation unit. We'd rather not link CMake itself against libclang, and this tool seems like a good candidate to modify to get the data we need. I'm willing to do this work myself.

I'm thinking for the JSON library we can use either libjsoncpp or Boost.JSON.

KyleFromKitware avatar Nov 14 '23 20:11 KyleFromKitware

AFAIK, libclang already has JSON tooling associated with it for things like compile_commands.json. I believe there is also a JSON output format for warnings (SARIF?). I see llvm/include/llvm/Support/JSON.h which is probably the best choice here.

mathstuf avatar Nov 14 '23 20:11 mathstuf

clang-include-graph uses the libclang C API. I don't think it uses the C++ API, so we might not be able to use the JSON API from LLVM.

KyleFromKitware avatar Nov 14 '23 20:11 KyleFromKitware

@KyleFromKitware Hi, yes JSON output seems like a really good idea in general for this tool. Do you have some specific JSON structure in mind already?

Also, do you need the JSON output for each of the Graph types generated (i.e. tree, topological sort, cycles) or just for specific one (I'm guessing topological-sort probably)?

I think in general I'd prefer to have JSON format as an independent option, so that you could print out JSON for all types of graphs for which it makes sense (except graphical formats such as PlantUML or MermaidJS), i.e.:

clang-include-graph -r . -l --topological-sort --format=json

{
  "src/dir1/dir2/translation_unit1.cpp": ["include/header1.h", "src/header2.h"],
  "src/dir1/dir2/translation_unit2.cpp": ["include/header1.h", "src/header3.h"],
}

and then similar outputs for other types of graphs, so that it could be more easily integrated for other use cases?

As to the JSON serialization I usually use nlohmann/json, as it's a single header include so does not generate additional dependencies...

bkryza avatar Nov 14 '23 20:11 bkryza

I'd rather have the JSON output be a raw "this included this, which in turn included this". External tooling can decide what to do with that information (calculate cycles, topological sort, etc.)

Keep in mind that for large translation units that include a lot of files (many of which get repeated), we don't want to repeat the filenames, because that would make the file much larger than it needs to be. The approach CMake has taken with the File API is to create an array of strings, and then reference files by an index into the string array.

So, all told, for the following file:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  return 0;
}

the output might look something like:

{
  "filenames": [
    "/home/user/project/main.c",
    "/usr/include/stdio.h",
    "/usr/include/stdlib.h"
  ],
  "includedAs": [
    "<stdio.h>",
    "<stdlib.h>"
  ],
  "files": [
    {
      "fileIndex": 0,
      "includeIndices": [
        1,
        2
      ]
    },
    {
      "fileIndex": 1,
      "includedAsIndex": 0,
      "includeIndices": [
        3
      ]
    },
    {
      "fileIndex": 2,
      "includedAsIndex": 1,
      "includeIndices": []
    },
    {
      "fileIndex": 2,
      "includedAsIndex": 1,
      "includeIndices": []
    }
  ]
}

assuming that <stdio.h> includes <stdlib.h>.

KyleFromKitware avatar Nov 14 '23 20:11 KyleFromKitware

@KyleFromKitware I've browsed through the CMake's codemodel structure and I think I understand your use case more now.

In general the structure is ok, however for a generic JSON output renderer, I would like to have at least the following features:

  • Support for multiple translation units in one JSON file (in particular if the user does not specify a specific translation unit the JSON file will contain all translation units from compile_commands.json, so the structure must accommodate that)
  • It should accept an option to print paths relative to specified directory (usually project root) so that the output JSON doesn't have to contain users home dir paths
  • In your example includedAs is off by one with respect to filenames
  • In addition to includeAs, a sourceLocation showing where each #include directive can be found

Having said that, I wouldn't mind having a custom --cmake-file-api, --cmake-codemodel-v2 or --indexed-json generator (of course these are just examples you can propose your own names), where you could have basically any structure that would be most appropriate and effective for your use case...

bkryza avatar Nov 14 '23 21:11 bkryza