ginkgo
ginkgo copied to clipboard
File config on small set
This PR picks some components from different base type of Ginkgo to show the necessary components in file config. It's not header-only like #800 because we need to instantiate all combinations of each template parameter and base type in compile time when the file includes the header. Thus, it leads the slow compilation issue. Moreover, Ginkgo only provides users to extend the base type like LinOp or Criterion, not the template type directly.
This pr provides two libraries file_config_custom
and file_config
.
file_config
contains the selection implementation in the library to avoid multiple definition issue when it is included from different places, but file_config_custom
implements it in the header to make users easily extend the selection.
There's no difference in practice if the file_config is only used in one file or the final application.
Note. Using file_custom in two distinct objects in the same library may lead the multiple definition issue. Users should first pack the custom type and the selection function into another library with file_config_custom
like file_config
.
current usage:
[
{
"base": "Csr", // it can also be Csr<float, int> directly
"ValueType": "double",
"IndexType": "int", // these two contains default like Csr<>
"exec": "given", // it can also be optional or construct different executor
"dim": [
3,
4
], // optional, it can also use 3 for square matrix
"num_nonzeros": 123, // optional
"strategy": "load_balance", // optional
"read": "file", // optional
"name": "A" // required in ResourceManager but no effect from function call
},
{
"base": "Isai",
"factory": {
"skip_sorting": true, // all factory's param are optional
"excess_solver_factory": {
// some linop factory construction or refer to existed data
},
"exec": "given" // can be construced in config, or given from outside.
},
"generate": "A" // or construct the matrix here or from outside
"add_loggers": {} // optional
}
]
TODO and discussion:
- [ ] where should we put the file config? should it be in ginkgo repo?
- [ ] type name suggestion?
- [ ] folder structure
Perhaps we should define how the file config should look like? That would mean, which keywords are used to create a CSR matrix, etc. I think it would help the discussion to see what kind of file configs we are talking about.
I think as a starting point, we should be thinking about what's the minimal set of features we need to have a viable integration into our applications, as long as the approach is extensible enough. For configuring a solver, the only types we need to know about are LinOpFactories and maybe stopping criterion factories, no matrix types or executors. For the file config schema, I would try to make as much as possible implied by context. An example would be
{
"type": "solver::Cg", // maybe we can omit the solver::, since there is only one Cg type
"value_type": "double",
"stop": {
"iterations": 1000, // there are only 5 types of stopping criteria we can instantiate,
"time": 100, // we can add keywords for all of them
"residual_reduction": 1e-6
},
"preconditioner": {
"type": "preconditioner::Jacobi", // maybe we can omit the preconditioner::, as it is implied by context
"value_type": "double", // value_type: double could be inherited
"index_type": "int32",
"max_block_size": 8
}
}
for me, the matrix type and executor are necessary. Like CG using Csr or Cg using Coo gives different performance due to SpMV and we put the executor mapping in all our examples. for stopping criterion, the shortcut version will be there in the end. I will also say the shortcut version should be also in ginkgo itself. Also, in some cases, the preconditioner does not use the same matrix as the solver. The initial design idea is to keep the same usage as ginkgo API such that users does not need to learn two rules for setting and give the same flexibility.
the default value type is like our template default not inherited from the previous part, but I am thinking about adding alias section to change the value_type easily
If a file object acts like a Factory::parameters
type and require a call to .on(exec)
in the end, there would be no need to involve executors at all, see #1336. As soon as we involve a preconditioner, we need to convert the matrix to Csr
again, anyways. A potential way to deal with this would be allowing users to specify which matrix type should be used for the SpMVs in an iterative solver, and implement the corresponding functionality in SolverBase
, or provide a LinOpFactory
that converts the input LinOp
to another type. We already discussed the same pattern for handling precision conversions in factory hierarchies.
no, I mainly mean the selection of the executor. The executor can be given from outside or constructed in the config.
Preconditioner requiring matrix in Csr is the limitation from the preconditioner, not the limitation to avoid using other format in other places. The preconditioner changes the format should not affect the format in the solver.
If the factory supports without .on(exec)
from the same parent class, it can be used in file_config for sure. Currently, it can already be hidden from the config because it also allows inheritance from the above executor.
Ginkgo supports the setting method -> file_config should also support it unless it can not be handled by json file. (For example, the full flexibility of user-defined lambda function in multigrid may not be possible)
@MarcelKoch do you mean the json-schema you mentioned in another pull request? or some readme/documentation for different class
@yhmtsai I meant what @upsj posted. Some examples of how the .json could look like.
Personally, I would go even further than @upsj and would omit the value_type
and index_type
completely from the config. I think this is not something that application users want to define. For example, for hyteg I had a fixed index_type
and a templated value_type
for my wrapper class that used the file config. These types should be used together with the file config.
omitting value type and index type is also supported and optional as I said. you need a way to fix the valuetype for the config, which is close to the alias section. putting them in template makes user need to recompile when they want to try different precision. there's no hurt with supporting the selection of type. we also have mixed precision support.
@MarcelKoch I updated the usage in the description.
I wrote a small prototype for what I would imagine a minimal approach would look like in the minimal_file_config_prototype branch, moving the complexity of ValueType etc. into template parameters like @MarcelKoch suggested. The interesting part is in config_parser.cpp, everything is based on #1336.
I'm not sure if I really want to open this up, but should we consider formats other than json? IMO json is not ideal for human written files. For example, it does not allow for comments, and you need lots of ", {, and such. I'm not saying we definitely need to switch, there are quite a few advantages of json, most importantly that it's widely used. But maybe we could discuss it.
I believe that was the idea behind #1295, I think architecturally we should consider the representation of the parsed data independent of the object construction implementation.
Oh and BTW, nlohmann-json can deal with comments as well
@upsj I'm concerned with the step before that (unless I misunderstood you). This is again the question of how a config file should look like, and if json is the best tool to represent that.
One feature that json is missing, would be references within the document. For example in yaml you could have
.criteria: &criteria
iterations: 100
cg:
type: solver::Cg
stop:
*criteria
gmres
type: solver::Gmres
stop:
*criteria
which I personally think is neat. But again, we should discuss if we want this or not.
@MarcelKoch as long as we can transform it into a tree with resolved references, any configuration language should be suitable. JSON is just a cleaner language from the parser perspective.
For me, the user perspective is way more important than the parser perspective for the config files, that's why I bring this whole issue up.
Agreed, to me the more contentious part of this design lies in the mapping from parsed config to generated object though. If we have a JSON object
as the top-level entry in the table, we can get references the same way by doing
{
"cg": {
"type": "solver::Cg",
"stop": "criteria",
"preconditioner": "jacobi-precond"
},
"gmres": {
"type": "solver::Gmres",
"stop": "criteria",
"preconditioner": "jacobi-precond"
}
"jacobi-precond": {
"type": "preconditioner::Jacobi",
"max_block_size": 8
},
"criteria": {
"iterations": 1000,
"time": 100,
"residual_reduction": 1e-6
}
}
Basically any place we expect an object
we also accept a string
that references a top-level object
from the parsed input.
I think in the JSON case, we would need to carry around some state. So when parsing the cg solver for example:
auto cg = parse(config["cg"]);
then parse
needs to have the whole config
available. That should not be too difficult, but it would move some work from the external parser to our side.
To me the question is: Do we need a user-accessible format-agnostic representation of the structure? If not, we can have a parse_json and parse_yaml function that only require an istream/string and the name of the object config we want to instantiate, and only share what code can be reused between them.
In yaml reference usage, will config["cg"] still get information from criterion? or, will it show a non-found reference? Another way is to preprocess the json first (hard copy the reference into corresponding block) such that the user can use config["cg"] directly.
currently, implementation also requires the whole json to achieve reference-like usage.
manager.put(full_config); // lazy construction
// manager.build(full_config); build everything
manager.search("cg"); // build the item when calling it
Just to make sure it's here, here is how we use the current ressource manager in OpenCARP which infinitely simplifies the interfacing to Ginkgo.
https://git.opencarp.org/openCARP/openCARP/-/blob/master/numerics/ginkgo/SF_ginkgo_solver.cc
It's unclear for preconditioners like BDDC, and also unclear how it would change for distributed.
These are the current files for the openCARP config:
- https://git.opencarp.org/openCARP/openCARP/-/blob/master/physics/parab_solver.json
- https://git.opencarp.org/openCARP/openCARP/-/blob/master/physics/ellip_solver.json