fusesoc icon indicating copy to clipboard operation
fusesoc copied to clipboard

Generators that depend on files

Open imphil opened this issue 4 years ago • 7 comments

Generators can be useful to transform configuration files or templates into code. To ensure that fusesoc can fully track dependencies (and rebuild only if sources change) [1] we need to have a dependency from the generator call to such template or configuration files. These dependencies could be passed as parameter to the generator, so we somehow need to account for that.

See https://github.com/lowRISC/opentitan/pull/1796 for an example where this functionality would be required for "full correctness".

[1] Note that this feature isn't there yet, but it's a goal we want to work towards.

imphil avatar Mar 26 '20 19:03 imphil

Not 100% sure we're talking about the same thing, so correct me if not, but I've been considering how to implement caching the generator output. A typical example for my development flow would be e.g. Xilinx simulation primitives for questasim. They take a long time to build, so I want a generator to only generate these if they don't exist. I also want to generate new ones if either Questasim or vivado is upgraded. This means I want to store them in a cache indexed by tool versions (and maybe other things).

My initial idea would be to

a) define a location for persistent generated data b) define API so that generators can indicate if they need to regenerate. I'm not sure actually if this is even necessary, or if generators can just do this ad-hoc

olofk avatar May 04 '20 14:05 olofk

We're probably not talking about the exactly same thing.

Roughly speaking, there are two places where caching can happen.

  • Fully cache the generator output and avoid calling the generator at all from fusesoc. That requires two things:

    • The assumption that generator runs are idempotent: calling a generator with the same input will result in the same output.
    • The assumption that FuseSoC knows about all inputs to the generator. Generators are often configured through parameters, and fusesoc knows about those. What fusesoc doesn't know at the moment are "internal dependencies" within a generator: a parameter which is actually a file name? a dependency on a tool (version)? [This issue is about one specific case here: the generator parameter which is actually a filename.]
  • Caching within the generator itself. FuseSoC calls the generator, but the generator re-uses cached data. The implementation of that is totally up to the generator, including where to store cache files, what dependencies to rely on, etc.

@olofk for your use case, the caching within the generator seems to be the way forward for two reasons: you have complicated cache invalidation rules (just determining the Vivado version isn't trivial), and you want to share a cache between fusesoc projects/builds (which again requires very careful cache indexing and invalidation rules).

imphil avatar May 04 '20 14:05 imphil

Agree that handling cache invalidation in the generator works best for my case

I'm still not sure however that I understand your original issue. Can you elaborate a bit more?

olofk avatar May 05 '20 18:05 olofk

Just to make sure there's no confusion: this issue is not about https://github.com/olofk/fusesoc/pull/391, it's independent from it.

My issue is about caching in a way that avoids calling the generator at all.

Look at this core file snippet:

generate:
  ral:
    generator: ralgen
    parameters:
      name: aes
      ip_hjson: ../../data/aes.hjson

Here, a generator is passed a parameter ip_hjson. To fusesoc, this parameter is a string. To the generator, however, this parameter is a file name and what matters are the contents of this file, not the file name itself. The aes.hjson file is not tracked by fusesoc in any way: it's not used for (hypothetical) caching decisions, it's not copied to the build directory, etc.

So when we want to do caching to avoid calling the generator, we need to check if the contents of aes.hjson changed. Or in other words: the generator needs to depend on the contents of the aes.hjson file, not on the string parameter ../../data/aes.hjson.

A solution could be something like that:

filesets:
  fileset_ip_hjson:
    files:
      - aes.hjson: {file_type: hjson}

generate:
  ral:
    generator: ralgen
    parameters:
      name: aes
    fileset_parameters:
      ip_hjson: fileset_ip_hjson

(I need to think much more about this syntax, it's just a rough first idea. Probably something along depend will work better in the end.)

imphil avatar May 07 '20 13:05 imphil

Wow, I'm slow. But I think I finally understand. And if I did understand, I think it will still be much easier to handle cache invalidation by the generator itself. Otherwise FuseSoC must keep track of the input files somewhere, and there must be a way for the generator to let FuseSoC know about its idempotency or we will have to run it anyway

olofk avatar Jun 16 '20 13:06 olofk

As a first step, I believe we need to allow the generators to have more control over their data. IIRC, today FuseSoC clears out the data dir before calling the generator. We should let the generator itself handle this. I think that would help with problems that @benreynwar has reported as well where he has several differently paramterized generators called from a core which overwrite each other's data

olofk avatar Feb 26 '21 08:02 olofk

Would it make sense to add a property to a generator that just says don't cache the output? That way if we know that a generator is making use of the contents of a file that fusesoc is not tracking then the cache will never be used.

benreynwar avatar May 19 '21 21:05 benreynwar

Surprise! This is now implemented!

olofk avatar Feb 23 '23 08:02 olofk