config
config copied to clipboard
Address long standing ordering/sorting issues.
Ordering/Sorting concerns for the library have existed coming up on a decade now.
Issues:
- #35
- #235
- #253
- #364
- #365
- #377
- #624
Pull Requests:
- #138
- #229
- #266
- #557
There have been many discussions weighing the benefits and drawbacks of these features, but no real outcome on a proper solution.
A few points of discussion:
Render Sorting
Some changes over the years have resulted in some kind of weird behaviour, especially #228.
The ability for deterministic rendering of the config object was added in #138, but the part where you could optionally decide how you want this to be done was gutted in b38e708 for no actual reason??
With how widely used and with such great extensibility potential for this library, it feels wrong to have such an impactful feature hardcoded with zero ability to modify it (not even with reflection) due to it's inline usage.
Adding back the ability to toggle sorting + the ability to individually choose what sorting to use per render would offer much higher levels of extensibility, while effecting no other functionality.
ex. Sorting keys based on custom defined order.
order = ["first", "second", "last"]
conf = config {
first=0
second=1
last=2
}
// default output
conf.render =>
first=0
last=2
second=1
// possible output
conf.render(sort: Comparator.comparingInt(order::indexOf)) =>
first=0
second=1
last=2
Ordering
Everywhere within the library, ordering of keys is completely arbitrary due to the nature of the map implementation used HashMap
.
This creates non deterministic behaviour in reading configs.
Some points brought up in the past seem to not make sense, ex:
Meaningful order doesn't play well with the way the library works in general. Note that even if Config was ordered, your call to entrySet there would trash the order (because Set isn't ordered either).
The Sets returned by sorted/ordered maps are all custom and preserve iteration order based on the maps contracts. That is, after all the entire point of those map implementations...
I feel like 8 years later would be a perfect time to actually attempt to come up with a solution to the presented issues with merging. I am more than happy to help prototype potential solutions.
Please address this.
https://github.com/lightbend/config/issues/365 seems to be the best current discussion and the issue is still open. I don't have plans to work on this myself (this library has little or no active maintenance, other than any emergency fixes that arise).
For rendering, there's just in general not a goal for the library to have fine-grained render control, or to preserve formatting. The theoretical way to do that would be via the ConfigDocument syntax-preserving representation which would need a lot of work. I do not remember why that one render options commit was merged and then reverted; I can't seem to find any discussion. I would probably assume there was a reason, though, like it might have broken something else. Potentially could figure it out through more git archaeology.
For sorting, the relevant problem is probably this https://github.com/lightbend/config/issues/35#issuecomment-24777359
i.e. withFallback needs to remain associative and we need to define what happens to the sort when two values are merged.
On #365 I pointed to the relevant unit test, and it would need extending to prove that with sort-sensitivity and duplicate keys the associativity is preserved. If we can define how sorting is done across withFallback
to preserve the associativity then it might be possible to preserve order. The library will break, though, if withFallback
is not associative, so the test suite needs to prove that it is, including the sort order if sort order is significant.
Both of these points are noted on #365 though, I'd probably suggest we keep the discussion there.
For rendering, there's just in general not a goal for the library to have fine-grained render control, or to preserve formatting. The theoretical way to do that would be via the ConfigDocument syntax-preserving representation which would need a lot of work.
Even if it's a non goal, the ability to optionally define the comparator used would not cause issues, and would be very useful for users of the library. Especially considering the actual alphanumeric sorting doesn't really make sense. Nothing else has this kind of behaviour and it's a major pain.
I do not remember why that one render options commit was merged and then reverted; I can't seem to find any discussion. I would probably assume there was a reason, though, like it might have broken something else. Potentially could figure it out through more git archaeology.
This was your response to the author of the PR:
If you're amenable I'd like to make this the hardcoded behavior and not a render option, because I don't see why someone would want random rather than sorted. Care to update the patch so it just adds the sorting and not an option?
So I don't particularly think there was any real reason for the revert other than you not seeing why someone would want the option.
For sorting, the relevant problem is probably this #35 (comment) i.e. withFallback needs to remain associative and we need to define what happens to the sort when two values are merged. On #365 I pointed to the relevant unit test, and it would need extending to prove that with sort-sensitivity and duplicate keys the associativity is preserved. If we can define how sorting is done across withFallback to preserve the associativity then it might be possible to preserve order. The library will break, though, if withFallback is not associative, so the test suite needs to prove that it is, including the sort order if sort order is significant.
And like I said, 8 years is quite a long time to come up with something to make this feature work. Intuitively, you expect to get out what you put in, so the lack of determinism causes a major pain point when working with the library.
I've forked the library and began playing with it to see what I can come up with, but I think this is long awaited and deserves attention, rather than just being thrown aside.
I don't disagree that it's worth looking at. To be clear, nothing is getting attention right now, so this isn't thrown aside in particular, it's just part of the general status of the library.
@havocp Would it be possible for you to supply an example of this system breaking/not fully functioning properly.
After all the changes I've made, I cannot get it to produce something unexpected and it seems like things just work. All tests pass aside from the single test: https://github.com/lightbend/config/blob/main/config/src/test/scala/com/typesafe/config/impl/ConfigTest.scala#L1335
And that's only failing due to the hardcoded string expecting the order of the values to be sorted alphanumerically, whereas due to my changes, they are based on insertion order. The contents are in fact the same values:
expected:<[w="Y.c.d",x=${a},y="X.b",z="Y.c]"> but was:<[x=${a},y="X.b",z="Y.c",w="Y.c.d]">
Forgive me, but I'm a bit confused about what's supposed to be breaking?
Let me back up to the big picture for a minute.
Background on the library design
The intuitive issue is what does it mean for an app to rely on the order some keys appear in, if the config is actually assembled from multiple files, environment variables, system properties, and even some Config
hardcoded in apps? And those keys can be in a different order in each of those places? What is the canonical order of some keys if keys can appear in N places, and some of those N places might even be unordered maps too?
The current answer to this problem is that order is not significant. It is random and apps can't rely on it. i.e. the Config
is an unordered map because we don't really know what meaningful order to put it in.
Now, if we extended the ConfigDocument API, that API doesn't involve merging stacks of multiple config sources or anything like that, it's just parsing a file. And that's when it makes 1000% sense to preserve formatting and order and so on.
Config
does not represent a parsed file, it represents an abstract "configuration" which may be merged from many configuration sources. This abstract configuration is not ordered or formatted right now. And in fact it doesn't even preserve all the data from the files it's based on... duplicate paths are merged or discarded. So trying to round-trip any details of a particular file through Config
won't work well, Config
is throwing too much away.
Why not add all that stuff (formatting, ordering, and keeping-duplicates-around) to Config? The primary reason for Config's existence is the withFallback
operation, and formatting/ordering/duplicates don't have a clear, meaningful interpretation once you call withFallback
even one time. So the contract of Config
is tough to preserve if Config
tries to be a document instead of an abstract configuration.
One way we could make formatting/ordering/duplicates available through a Config
is to associate the source ConfigDocument
with the config, either in the ConfigOrigin
annotations or in a dedicated sourceDocuments
field.
However, note that if an app wanted to consider formatting/ordering/duplicates it would need to look at a list of sourceDocuments
and figure out how to deal with the fact that they may have the same key in multiple documents or multiple times in each document, formatted and ordered differently each time.
For any state added to Config
we need to work out how that state gets handled by withFallback
.
debug data
Config
does keep around ConfigOrigin
(and maybe some comments, I forget); this is for debugging purposes, but it isn't semantic. An app using this data for its configuration would be a bad idea, and it isn't considered in the implementation of equals
or otherwise used semantically. It's just a sort of debug trail.
why withFallback is associative
It makes it deterministic to merge config sources, rather than surprisingly dependent on how they are grouped.
When would this come up in practice? I can't predict every way, but some examples that might change the sequence of withFallback
calls:
- when using sbt-assembly or similar to make a "fat jar", typically multiple config files are concatenated into one, which means we'd have no calls to withFallback() in the fat jar case and multiple calls in the non-fat case, and the answers could now be different. So fat-jar'ing the app might change the behavior.
- in application code when I'm merging multiple sources of config, I might happen to change the order I call
withFallback
, or add code to merge in a new source of config. IfwithFallback
's behavior is not associative, seemingly harmless changes to my app code might happen to change the result that I get. - The library uses
withFallback
for internal things, for example I believe the logic in the library for handling substitutions${foo.bar}
uses it. We don't want that logic to change its behavior in surprising ways depending on how the resolve algorithm plays out.
what breaks if withFallback is not associative with respect to the order of keys
Right now in theory nothing breaks because there are no guarantees about key order. So nobody should be relying on it. In math terms two Config with different orders can still be considered the same value.
But if we claim there are guarantees about key order, anyone who relies on that claim will get surprises because the order will shift around depending on how we associate configs in order to merge them. Unless the Config is just from a single file with no internal merges within that file, i.e. unless the Config is pretty much just one JSON file.
What breaks (in subtle, hard to predict ways) is anyone relying on the new ordering feature to configure their app differently depending on order.
how to test that withFallback is associative if ordering becomes semantically meaningful
How might we test a deterministic computation of key order on Config
? There is a function associativeMerge
in the test suite already tests the associative property. So the test ends up similar to this existing one: https://github.com/lightbend/config/blob/f92a4ee2f672c8254d87b27831d0903d3d0d6cc2/config/src/test/scala/com/typesafe/config/impl/ConfigTest.scala#L145-L159
However: note that associativeMerge
uses equals
to test that any way we group the merges will give the same result, and equals
does not currently consider order part of equality. If order is semantically significant, however, then at least this associativity test should start to consider order part of equality.
Coding the unit test blind, if we don't actually fix equals (maybe we should though), this test could be the one I suggested in this old comment:
def mergeWithReordering() {
val obj1 = parseObject("""{ "a" : 1 }""")
val obj2 = parseObject("""{ "b" : 2 }""")
val obj3 = parseObject("""{ "a" : 3 }""")
val obj4 = parseObject("""{ "c" : 4 }""")
val obj5 = parseObject("""{ "a" : 5 }""")
associativeMerge(Seq(obj1, obj2, obj3, obj4, obj5)) { merged =>
// assert in here that `merged` has the expected keys in expected order, always the same order
}
}
This is a pretty simple test, and there are probably some more complicated cases that should be considered as well. Cases using nested keys and using ${}
syntax for example.
bottom line
if we make Config
preserve order, I'd like to see 1) what does that mean conceptually if we're thinking of Config
primarily as the thing we call withFallback
on to get an abstract configuration - how does someone force a certain order when they are configuring an app via multiple files and env variables? 2) what's the implementation of withFallback
that keeps it associative, that reflects this conceptual meaning.
Also, I'd love to see ConfigDocument
extended to cover some of these single-file use-cases.
All that past rationale aside, I should also have asked - what immediate specific thing are you doing where preserving order would be useful? What problem are you trying to solve by preserving order?
The intuitive issue is what does it mean for an app to rely on the order some keys appear in, if the config is actually assembled from multiple files, environment variables, system properties, and even some Config hardcoded in apps? And those keys can be in a different order in each of those places? What is the canonical order of some keys if keys can appear in N places, and some of those N places might even be unordered maps too?
Simply insertion order. That is the entire conceivable contract. Keys should be in the order they are encountered in. Whether they come from ordered sources or unordered sources.
For the case where applications use ConfigFactory.load()
as their entry point, it's not particularly a hard to grasp concept that
ConfigFactory.load().withFallback(myCustomConfig)
Will result of the values of myCustomConfig
being inserted AFTER the values of ConfigFactory.load()
. It just makes sense.
The order of loading values is already well defined, in the readme:
- system properties
- application.conf (all resources on classpath with this name)
- application.json (all resources on classpath with this name)
- application.properties (all resources on classpath with this name)
- reference.conf (all resources on classpath with this name)
It is once again, intuitive for one to expect due to a defined order of loading, to end up with a defined order of values.
The current answer to this problem is that order is not significant. It is random and apps can't rely on it. i.e. the Config is an unordered map because we don't really know what meaningful order to put it in.
This sort of highlights a big oversight here, it's only random because of a choice of implementation.
Yes, Config
is a map, but semantically "configs" are not simply unordered bags of values. They have meaning, and they have structure.
It doesn't necessarily need to be "put" in a certain order, it should just be. A "meaningful" order would be for it to be IN ORDER.
This touches on #228, there wouldn't be a need for a bandaid bug fix for sorting if things were just ordered by insertion. It would just work how you would expect them to, intuitively.
Why not add all that stuff (formatting, ordering, and keeping-duplicates-around) to Config? The primary reason for Config's existence is the withFallback operation, and formatting/ordering/duplicates don't have a clear, meaningful interpretation once you call withFallback even one time. So the contract of Config is tough to preserve if Config tries to be a document instead of an abstract configuration.
I don't see how
"abc".withFallback("bcde") => "abcde"
"bcde".withFallback("abc") => "bcdea"
Isn't clear and meaningful, withFallback being a merge operation, literally just a more complex concatenation operation makes the most sense when formatting and ordering is preserved.
Everything considered, the move to linked implementations does not affect the associativity of the with
operations.
People who don't use them will notice nothing, people who do will notice that what they give the system, is what they get from the system. Which is something I would consider very important for configuration in general.
If someone is strictly relying on the order of values within the config, it's not a leap to assume they are closely watching their order of operations already.
Also, I'd love to see ConfigDocument extended to cover some of these single-file use-cases.
I would also love to see extension added in this regard, and also for it to actually be properly documented. Outside of the javadocs, there is zero mention of the existence of this API.
EDIT: After rereading, I noticed I'm using a very strong tone here. I'm not meaning to come across as angry, I'm just trying to get my point across about how obvious and clear this is to me.
All that past rationale aside, I should also have asked - what immediate specific thing are you doing where preserving order would be useful? What problem are you trying to solve by preserving order?
https://github.com/Kotlin/kotlinx.serialization/tree/master/formats#hocon
While not exposed by the API of kotlinx.serialization, they use this library as the backend for decoding. So naturally one would also want to use it for encoding. Currently implemented through proxying the output of the JSON serializer.
I hope you can see how this can be extremely frustrating to come across, especially for cases where the actual type is much more complex and involved. Especially after doing the research into the issue and seeing years and years of backlogging.
On the topic of JSON field ordering: https://stackoverflow.com/a/16870531
Cheers, √
@viktorklang There's the spec. And then, there's the fact that in practice JavaScript implementations maintain order and lots of code out there rely on this behavior. All popular JSON parsers ultimately chose to lose this battle and comply with people's expectations. Library developers can't win against "it works on node and in my browser but not with your library". For example, Jackson uses a LinkedHashMaps for objects by default.
A list of JVM JSON implementations that preserve order:
- kotlinx.serialization
- Gson
- json-io
- Spring Boot
just to name a few... @viktorklang
On the topic of JSON field ordering: https://stackoverflow.com/a/16870531 -- Cheers, √
But also do note, this is not about JSON, JSON is not presented as a Human-Optimized Config Object Notation. And for humans, it's intuitive for things to be in order.
I commented on the JSON as it was used as an example.
Personally, having used HOCON since day 1, I have never had a problem with unordered fields as they are indexed/accessed by name, so as I’ve never had the problem I should probably stay out of the conversation.
Cheers, √
I commented on the JSON as it was used as an example. Personally, having used HOCON since day 1, I have never had a problem with unordered fields as they are indexed/accessed by name, so as I’ve never had the problem I should probably stay out of the conversation. -- Cheers, √
JSON itself was not meant to be used as a comparison. I'm just using that to show the API I'm using, kotlinx.serialization.
If the answer is as simple as insertion order, can we make my proposed associativity test case pass with that answer?
The order of loading values is already well defined
So the order is well-defined, but the associativity is not (the associativity in most apps is basically random and people wouldn't think about it). I actually could not even tell you how this library does the associativity right now. If we load files X, Y, Z, does it merge XY and then Z below, or YZ then X above? I don't remember exactly what it does. But it also doesn't currently matter.
The merge operations involved are not just between the 5 files, but potentially between many sub-parts of those files (due to duplicates and substitutions).
@havocp
If the answer is as simple as insertion order, can we make my proposed associativity test case pass with that answer?
Is this what you're looking for? Please excuse any glaring language issues, I don't actually know Scala...
If we load files X, Y, Z, does it merge XY and then Z below, or YZ then X above? I don't remember exactly what it does.
Well if we load X, then Y, then Z, I would imagine it would merge Y into X, and then Z into that. Maybe I'm missing something, but I don't think I understand what you're asking? Are these operations not atomic and completely separate from each other?
To my knowledge, the final object is not simply just the result of throwing all key value pairs into a bag, and searching for them. It stores the values in the map as it encounters them, and updates those values according to the merge operation.
Due to the behaviour of the linked types, this would result in the values keeping their order and simply being replaced.
The merge operations involved are not just between the 5 files, but potentially between many sub-parts of those files (due to duplicates and substitutions).
And are those operations not handled only in their respective scope? The only thing that could throw this out of whack is if the entries in the map were removed and re-added without obeying iteration order.
The only place that could have happened before, after a quick glance, is in the replaceChild
method.
But since that iterates over the entrySet of the map, being a linked implementation, iteration order would still be preserved there.
Again, please feel free to correct any of my assumptions, because that's mostly what they are.
Are these operations not atomic and completely separate from each other?
They are ... but only if withFallback
is associative.
The test you wrote is the right one but I don't understand why it passes - maybe because I made "a" appear symmetrically in the proposed test case? I would try some variations and then if you have it working let's try to understand why.
The test you wrote is the right one but I don't understand why it passes - maybe because I made "a" appear symmetrically in the proposed test case?
The issue here is that the keys would obviously be in a different order if we merge in a different order. Simply due to them being encountered differently.
The method of testing would need to take that into account, and could not simply be a foreach over all generated results.
If you make the test merge a more complicated list of keys does it still pass?
I don't know, something like a,b,a,c,q,c,q,d,c,b,q
If this actually deterministically associative there's no big issue. We also have to think about nested objects and substitutions ${}
in what we test and think through.
If you make the test merge a more complicated list of keys does it still pass? I don't know, something like a,b,a,c,q,c,q,d,c,b,q If this actually deterministically associative there's no big issue. We also have to think about nested objects and substitutions
${}
in what we test and think through.
In terms of what it means for the configs to be merged, the Map is still a Map, so it will still always be associative in that regard. The only difference being made is that the entries will be ordered depending on which was encountered first.
given:
a, b, a, c, q, c, q, d, c, b, q
you will get:
a, b, c, q, d
given the opposite:
q, b, c, d, q, c, q, c, a, b, a
you will get:
q, b, c, d, a
This is the only difference that happens, nothing else should be affected in the slightest. Because a LinkedHashMap is still a HashMap, no other behaviour changes besides the keys being in insertion order.
If you could help me out with designing a test to prove this, that would be nice. I don't know Scala, so I'm sort of useless in that regard.
@havocp
Is this what you're looking for?
I can attempt creating one that uses substitutions, but this is pretty much already over my head.
Yep I think so. So what is the "rule" making that deterministic, is it that we always keep the relative order of keys in the first object, when we merge in a fallback?
say we have
val config1 = parseConfig("""{ a:1, b:2, c:3 }""")
val config2 = parseConfig("""{c:4,d:5,b:6,e:7,a:8,f:9}""")
val config3 = config1.withFallback(config2)
is the resulting order of config3 going to be a,b,c,d,e,f by that rule, do I understand that correctly?
The result seems to be that no newly-appearing key in a fallback can ever be earlier in the order than any key which was already mentioned in an "on top" object. An "on top" object can reorder all keys, as long as it mentions all of them (anything not mentioned stays after anything that is mentioned).
I'm wondering how to prove that this algorithm is always associative vs. happens to work in the cases we tried. :thinking: especially if we get the ${}
feature involved. I think I would need to get out some scratch paper and try to work examples.
Yep I think so. So what is the "rule" making that deterministic, is it that we always keep the relative order of keys in the first object, when we merge in a fallback?
Correct. And this would happen without having to do anything at all due to the nature of linkedhashmap: Note that insertion order is not affected if a key is re-inserted into the map.
is the resulting order of config3 going to be a,b,c,d,e,f by that rule, do I understand that correctly?
Correct.
The result seems to be that no newly-appearing key in a fallback can ever be earlier in the order than any key which was already mentioned in an "on top" object. An "on top" object can reorder all keys, as long as it mentions all of them (anything not mentioned stays after anything that is mentioned).
Exactly. Which is perfectly deterministic based on the rule of (insertion) encountering order.
I'm wondering how to prove that this algorithm is always associative vs. happens to work in the cases we tried. 🤔 especially if we get the ${} feature involved. I think I would need to get out some scratch paper and try to work examples.
Is this not proven by the test mergeObjectSubstitutionObjectSubstitution
?
Maybe you're looking for something even deeper?
yeah I meant like an informal math-style proof. just thinking through whether the stated rule if correctly implemented is associative.
It won't hurt to also test some weird cases of course.
yeah I meant like an informal math-style proof. just thinking through whether the stated rule if correctly implemented is associative.
It won't hurt to also test some weird cases of course.
Ok, well if you're willing to either write that up, or point me in the direction that would be great. Personally, I've met my technical limit. 😬
All in all though, I am more than certain this will just work.
I learned about this library through the kotlinx.serialization project. After encountering this sorting problem, I was very frustrated when I learned that there was a lack of active maintenance.