guac icon indicating copy to clipboard operation
guac copied to clipboard

Feedback on improving documentation

Open mdeicas opened this issue 1 year ago • 6 comments

Opening this issue to discuss and collect feedback on what areas of Guac need improved documentation. Feel free to comment with any shortcomings you've encountered in the documentation.

Some possible areas for new / better documentation are:

  • Better status indicators
    • A high level status page of what development work is currently being done in Guac
    • Better documentation on the status of the supported backends, the status of supported collectors, the status of supported document types.
    • A list of known bugs and limitations
  • How to use the APIs
  • A more user-friendly summary of what features are supported or what new features have been added in each Guac release.
  • Better documentation on Guac’s overall architecture. Like https://docs.guac.sh/guac-components/ in more detail.
    • This could also explain how to run Guac via individual binaries instead of with the docker compose or helm charts.
    • Better explanation on the logical role of each binary and what it does
  • Update the readme in https://github.com/guacsec/guac/tree/main/pkg/assembler/backends

mdeicas avatar Nov 27 '23 17:11 mdeicas

related issue https://github.com/guacsec/guac/issues/1368

pxp928 avatar Nov 27 '23 22:11 pxp928

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

ridhoq avatar Nov 28 '23 23:11 ridhoq

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

Fair point. Basically, these are at different ends of the pipeline. We are using assemblers to extract information from supply chain documents (SLSA, etc.) and create data structs to be passed into GraphQL, using a common format.

On the other side, the database part of the pipeline needs to implement writing to the database (or reading is user query) for each GraphQL mutation (query). Each database we support is implemented as a backend, though actually there might be multiple databases supported by the same backend.

The ingestion pipeline looks something like:

flowchart LR
  SLSA_doc1 --> SLSA_assembler;
  SLSA_doc2 --> SLSA_assembler;
  SBOM_doc1 --> SBOM_assembler;
  SBOM_doc2 --> SBOM_assembler;
  SBOM_doc3 --> SBOM_assembler;
  SLSA_assembler --> gqlm[[GraphQL server]];
  SBOM_assembler --> gqlm;
  gqlm --> backend1([in memory backend]);
  gqlm --> backend2([Ent backend]);
  gqlm --> backend3([Neo4J backend]);
  gqlm --> backend4([ArangoDB backend]);
  backend2 --> postgress[(Postgress)];
  backend2 --> sqlite[(SQLite)];
  backend3 --> neo4j[(Neo4J)];
  backend4 --> arango[(ArangoDB)];

The query from users will hit the GraphQL server and receive the results from there. What's on the right of the server are backends.

mihaimaruseac avatar Nov 28 '23 23:11 mihaimaruseac

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

Fair point. Basically, these are at different ends of the pipeline. We are using assemblers to extract information from supply chain documents (SLSA, etc.) and create data structs to be passed into GraphQL, using a common format.

On the other side, the database part of the pipeline needs to implement writing to the database (or reading is user query) for each GraphQL mutation (query). Each database we support is implemented as a backend, though actually there might be multiple databases supported by the same backend.

The ingestion pipeline looks something like:

flowchart LR
  SLSA_doc1 --> SLSA_assembler;
  SLSA_doc2 --> SLSA_assembler;
  SBOM_doc1 --> SBOM_assembler;
  SBOM_doc2 --> SBOM_assembler;
  SBOM_doc3 --> SBOM_assembler;
  SLSA_assembler --> gqlm[[GraphQL server]];
  SBOM_assembler --> gqlm;
  gqlm --> backend1([in memory backend]);
  gqlm --> backend2([Ent backend]);
  gqlm --> backend3([Neo4J backend]);
  gqlm --> backend4([ArangoDB backend]);
  backend2 --> postgress[(Postgress)];
  backend2 --> sqlite[(SQLite)];
  backend3 --> neo4j[(Neo4J)];
  backend4 --> arango[(ArangoDB)];

The query from users will hit the GraphQL server and receive the results from there. What's on the right of the server are backends.

This is helpful! FWIW, I did end up coming to this understanding, but it was only after reading the code. It would be great to include this diagram in the docs. It probably merits some discussion in the GUAC components page as well

ridhoq avatar Nov 29 '23 00:11 ridhoq

Another request: It would be great to have a more in-depth document on how the topological queries work and some examples of inputs to the queries and sample outputs. Specifically these two queries:

neighbors(node: ID!, usingOnly: [Edge!]!): [Node!]!
path(subject: ID!, target: ID!, maxPathLength: Int!, usingOnly: [Edge!]!): [Node!]!

From my understanding, many GUAC query use cases involve neighbors and path queries so it would be beneficial to the project to cover these in more depth. The only document I could find on this topic was the topological definitions section of the GraphQL doc but please feel free to point me towards another doc that I might have missed. Thanks!

ridhoq avatar Dec 29 '23 21:12 ridhoq

Need to also add documentation for the filtering via the graphQL directive: https://github.com/guacsec/guac/issues/1615

pxp928 avatar Jan 07 '24 21:01 pxp928