ginkgo icon indicating copy to clipboard operation
ginkgo copied to clipboard

File config on small set

Open yhmtsai opened this issue 1 year ago • 22 comments

This PR picks some components from different base type of Ginkgo to show the necessary components in file config. It's not header-only like #800 because we need to instantiate all combinations of each template parameter and base type in compile time when the file includes the header. Thus, it leads the slow compilation issue. Moreover, Ginkgo only provides users to extend the base type like LinOp or Criterion, not the template type directly.

This pr provides two libraries file_config_custom and file_config. file_config contains the selection implementation in the library to avoid multiple definition issue when it is included from different places, but file_config_custom implements it in the header to make users easily extend the selection. There's no difference in practice if the file_config is only used in one file or the final application. Note. Using file_custom in two distinct objects in the same library may lead the multiple definition issue. Users should first pack the custom type and the selection function into another library with file_config_custom like file_config.

current usage:

[
    {
        "base": "Csr", // it can also be Csr<float, int> directly
        "ValueType": "double",
        "IndexType": "int", // these two contains default like Csr<>
        "exec": "given", // it can also be optional or construct different executor
        "dim": [
            3,
            4
        ], // optional, it can also use 3 for square matrix
        "num_nonzeros": 123, // optional
        "strategy": "load_balance", // optional
        "read": "file", // optional
        "name": "A" // required in ResourceManager but no effect from function call
    },
    {
        "base": "Isai",
        "factory": {
            "skip_sorting": true, // all factory's param are optional
            "excess_solver_factory": {
                // some linop factory construction or refer to existed data
            },
            "exec": "given" // can be construced in config, or given from outside.
        },
        "generate": "A" // or construct the matrix here or from outside
        "add_loggers": {} // optional
    }
]

TODO and discussion:

  • [ ] where should we put the file config? should it be in ginkgo repo?
  • [ ] type name suggestion?
  • [ ] folder structure

yhmtsai avatar Jun 12 '23 07:06 yhmtsai

Perhaps we should define how the file config should look like? That would mean, which keywords are used to create a CSR matrix, etc. I think it would help the discussion to see what kind of file configs we are talking about.

MarcelKoch avatar Jun 12 '23 13:06 MarcelKoch

I think as a starting point, we should be thinking about what's the minimal set of features we need to have a viable integration into our applications, as long as the approach is extensible enough. For configuring a solver, the only types we need to know about are LinOpFactories and maybe stopping criterion factories, no matrix types or executors. For the file config schema, I would try to make as much as possible implied by context. An example would be

{
    "type": "solver::Cg", // maybe we can omit the solver::, since there is only one Cg type
    "value_type": "double",
    "stop": {
        "iterations": 1000, // there are only 5 types of stopping criteria we can instantiate,
        "time": 100,        // we can add keywords for all of them
        "residual_reduction": 1e-6
    },
    "preconditioner": {
        "type": "preconditioner::Jacobi", // maybe we can omit the preconditioner::, as it is implied by context
        "value_type": "double",           // value_type: double could be inherited
        "index_type": "int32",
        "max_block_size": 8
    }
}

upsj avatar Jun 12 '23 13:06 upsj

for me, the matrix type and executor are necessary. Like CG using Csr or Cg using Coo gives different performance due to SpMV and we put the executor mapping in all our examples. for stopping criterion, the shortcut version will be there in the end. I will also say the shortcut version should be also in ginkgo itself. Also, in some cases, the preconditioner does not use the same matrix as the solver. The initial design idea is to keep the same usage as ginkgo API such that users does not need to learn two rules for setting and give the same flexibility.

the default value type is like our template default not inherited from the previous part, but I am thinking about adding alias section to change the value_type easily

yhmtsai avatar Jun 12 '23 14:06 yhmtsai

If a file object acts like a Factory::parameters type and require a call to .on(exec) in the end, there would be no need to involve executors at all, see #1336. As soon as we involve a preconditioner, we need to convert the matrix to Csr again, anyways. A potential way to deal with this would be allowing users to specify which matrix type should be used for the SpMVs in an iterative solver, and implement the corresponding functionality in SolverBase, or provide a LinOpFactory that converts the input LinOp to another type. We already discussed the same pattern for handling precision conversions in factory hierarchies.

upsj avatar Jun 12 '23 14:06 upsj

no, I mainly mean the selection of the executor. The executor can be given from outside or constructed in the config. Preconditioner requiring matrix in Csr is the limitation from the preconditioner, not the limitation to avoid using other format in other places. The preconditioner changes the format should not affect the format in the solver. If the factory supports without .on(exec) from the same parent class, it can be used in file_config for sure. Currently, it can already be hidden from the config because it also allows inheritance from the above executor. Ginkgo supports the setting method -> file_config should also support it unless it can not be handled by json file. (For example, the full flexibility of user-defined lambda function in multigrid may not be possible)

yhmtsai avatar Jun 12 '23 14:06 yhmtsai

@MarcelKoch do you mean the json-schema you mentioned in another pull request? or some readme/documentation for different class

yhmtsai avatar Jun 12 '23 14:06 yhmtsai

@yhmtsai I meant what @upsj posted. Some examples of how the .json could look like.

MarcelKoch avatar Jun 12 '23 14:06 MarcelKoch

Personally, I would go even further than @upsj and would omit the value_type and index_type completely from the config. I think this is not something that application users want to define. For example, for hyteg I had a fixed index_type and a templated value_type for my wrapper class that used the file config. These types should be used together with the file config.

MarcelKoch avatar Jun 12 '23 15:06 MarcelKoch

omitting value type and index type is also supported and optional as I said. you need a way to fix the valuetype for the config, which is close to the alias section. putting them in template makes user need to recompile when they want to try different precision. there's no hurt with supporting the selection of type. we also have mixed precision support.

yhmtsai avatar Jun 12 '23 15:06 yhmtsai

@MarcelKoch I updated the usage in the description.

yhmtsai avatar Jun 12 '23 16:06 yhmtsai

I wrote a small prototype for what I would imagine a minimal approach would look like in the minimal_file_config_prototype branch, moving the complexity of ValueType etc. into template parameters like @MarcelKoch suggested. The interesting part is in config_parser.cpp, everything is based on #1336.

upsj avatar Jun 13 '23 09:06 upsj

I'm not sure if I really want to open this up, but should we consider formats other than json? IMO json is not ideal for human written files. For example, it does not allow for comments, and you need lots of ", {, and such. I'm not saying we definitely need to switch, there are quite a few advantages of json, most importantly that it's widely used. But maybe we could discuss it.

MarcelKoch avatar Jun 13 '23 12:06 MarcelKoch

I believe that was the idea behind #1295, I think architecturally we should consider the representation of the parsed data independent of the object construction implementation.

upsj avatar Jun 13 '23 12:06 upsj

Oh and BTW, nlohmann-json can deal with comments as well

upsj avatar Jun 13 '23 13:06 upsj

@upsj I'm concerned with the step before that (unless I misunderstood you). This is again the question of how a config file should look like, and if json is the best tool to represent that.

One feature that json is missing, would be references within the document. For example in yaml you could have

.criteria: &criteria
  iterations: 100

cg:
  type: solver::Cg
  stop: 
    *criteria

gmres
  type: solver::Gmres
  stop: 
    *criteria

which I personally think is neat. But again, we should discuss if we want this or not.

MarcelKoch avatar Jun 13 '23 13:06 MarcelKoch

@MarcelKoch as long as we can transform it into a tree with resolved references, any configuration language should be suitable. JSON is just a cleaner language from the parser perspective.

upsj avatar Jun 13 '23 13:06 upsj

For me, the user perspective is way more important than the parser perspective for the config files, that's why I bring this whole issue up.

MarcelKoch avatar Jun 13 '23 13:06 MarcelKoch

Agreed, to me the more contentious part of this design lies in the mapping from parsed config to generated object though. If we have a JSON object as the top-level entry in the table, we can get references the same way by doing

{
    "cg": {
        "type": "solver::Cg",
        "stop": "criteria",
        "preconditioner": "jacobi-precond"
    },
    "gmres": {
        "type": "solver::Gmres",
        "stop": "criteria",
        "preconditioner": "jacobi-precond"
    }
    "jacobi-precond": {
        "type": "preconditioner::Jacobi",
        "max_block_size": 8
    },
    "criteria": {
        "iterations": 1000,
        "time": 100,
        "residual_reduction": 1e-6
    }
}

Basically any place we expect an object we also accept a string that references a top-level object from the parsed input.

upsj avatar Jun 13 '23 13:06 upsj

I think in the JSON case, we would need to carry around some state. So when parsing the cg solver for example:

auto cg = parse(config["cg"]);

then parse needs to have the whole config available. That should not be too difficult, but it would move some work from the external parser to our side.

MarcelKoch avatar Jun 13 '23 13:06 MarcelKoch

To me the question is: Do we need a user-accessible format-agnostic representation of the structure? If not, we can have a parse_json and parse_yaml function that only require an istream/string and the name of the object config we want to instantiate, and only share what code can be reused between them.

upsj avatar Jun 13 '23 13:06 upsj

In yaml reference usage, will config["cg"] still get information from criterion? or, will it show a non-found reference? Another way is to preprocess the json first (hard copy the reference into corresponding block) such that the user can use config["cg"] directly.

currently, implementation also requires the whole json to achieve reference-like usage.

manager.put(full_config); // lazy construction
// manager.build(full_config); build everything
manager.search("cg"); // build the item when calling it

yhmtsai avatar Jun 14 '23 08:06 yhmtsai

Just to make sure it's here, here is how we use the current ressource manager in OpenCARP which infinitely simplifies the interfacing to Ginkgo.

https://git.opencarp.org/openCARP/openCARP/-/blob/master/numerics/ginkgo/SF_ginkgo_solver.cc

It's unclear for preconditioners like BDDC, and also unclear how it would change for distributed.

These are the current files for the openCARP config:

  • https://git.opencarp.org/openCARP/openCARP/-/blob/master/physics/parab_solver.json
  • https://git.opencarp.org/openCARP/openCARP/-/blob/master/physics/ellip_solver.json

tcojean avatar Jun 14 '23 12:06 tcojean