libelektra
libelektra copied to clipboard
Finalize decisions
- [x] uses cases of Elektra's API
- [x] #4717 @kodebach
- [x] #4744 @kodebach
- [x] #4678
- [x] #4689
- [x] change tracking @atmaxinger
- [x] 0_drafts/notifications.md @atmaxinger
- [x] 0_drafts/operation_sequences.md @atmaxinger
- [x] #4641
- [x] #4716 @kodebach
- [x] #4745 @kodebach
- [ ] 1_in_discussion/spec_expressiveness.md 5_implemented/default_values.md @tmakar
- [ ] #4715 @kodebach
- [ ] 0_drafts/commit_function.md @flo91 (no decision PR yet)
- [ ] 0_drafts/man_pages and maybe improvements of 5_implemented/error_message_format.md @hannes99 (no decision PR yet)
- [ ] ksCut vs. ksFindHierarchy?
- [ ] symlinks+context lookup
- [ ] "binary" metadata
This issue is to foster collaboration between @mpranj, @kodebach and @atmaxinger for restructuring core data structures. Together we can tackle a quantum leap not possible if everyone works alone.
The currently proposed idea, with some extensions from me is, that
- first @atmaxinger implements key/keyset changes to allow efficient change tracking, and then
- @kodebach does the needed refactorings of API
- @mpranj fixes the mmap cache/storage, with improvements as discussed in store_name.md.
We want to avoid that anyone needs to do redo time-consuming fixes (like fixing the cache or doing API renames), so we must clarify the overall picture of essential changes in decisions first.
I am looking forward on seeing this becoming reality! :heart:
The first question, before we can make any plans is:
@mpranj do you have the time to work on this right now? If not, there is no point in trying to involve you.
I think I can make time.
The fixes to mmapstorage regarding just keynames here are probably a smaller undertaking than the new backend stuff.
Is there a bigger task here that I am overlooking?
The changes from store_name.md are not much, no. But depending on the outcome of #4641 and all the other decisions spawned from change tracking, there could be bunch more changes needed. I wouldn't underestimate the time requirement, especially since it is still entirely unclear what we even want to do.
Is there a bigger task here that I am overlooking?
./doc/decisions/2_in_progress/store_name.md would be a bigger task if we want even lower memory use and not store the whole key name in every key. But its best if you start a decision PR and we discuss it directly there.
@markus2330 how do we continue with the decisions? I think we ideally decide the COW decision, and then we can finalize internal cache and change tracking decision.
I'm just not sure in which folder to put the COW decision. Is it "In Progress" or in "Decided"?
I think you can open all of them in parallel.
- internal cache is, without details of specific COW mechanism, more or less a formal act to decide. So if you want something finished (hopefully) fast, go for it.
- COW decision: I don't know yet what your exact ideas of the split up are, maybe we need to discuss the problem first. So maybe start by tagging it as draft, we might merge it "In Progress" if everything goes smooth. If it is a pure split-up for documentation purposes without any surprises, you can also directly go for the "Decided".
- change tracking decision probably needs to be later as it depends on everything else: there is also Transformations where we probably need a bit more insight before we can decide Change Tracking. It is probably also a good idea that you first implement/improve COW/3-way merge before implementing change tracking. If you have all the pieces ready the implementation of change tracking is a small thing anyway.
I disagree with the order. IMO it is clear COW needs to be worked on first. Everything else could use COW, depending on what COW looks like. Sure we could decide the other things first, say "we will use COW" and in the process adding a constraint to COW that "decision X must be able to use COW", but that seems backwards.
I also think we should create as few parallel decision PRs as possible. Parallel decision PRs only split attention and reduce focus. (CLARIFICATION: I mean parallel decision PRs lead by the same author.)
I agree with starting the COW PR in "In Progress" or "In Discussion" (still don't understand the difference). If we come to a decision within the PR, I'd move the file to to decided in the last commit of the PR. If the PR gets to big and we want to merge first then the next PR would follow the same process. Also since I think the problem of the decision has changed (see https://github.com/ElektraInitiative/libelektra/pull/4641#pullrequestreview-1178121572), I don't think starting in "Decided" makes sense. IMO we are quite a bit away from a decision. Unless of course we just make an abstract decision to do the "full-blown COW" and work out the implementation details afterwards in some other form.
If you have all the pieces ready the implementation of change tracking is a small thing anyway.
Sounds a bit like "I could build Twitter in a weekend" ;) I wouldn't underestimate the effort, not least because we almost certainly will find new edge cases when we actually start testing the change tracking stuff.
If it is a pure split-up for documentation purposes
More or less, yes.
I also think we should create as few parallel decision PRs as possible.
This is a difficult topic with pros and cons. Doing more parallel might delay to come to a particular decision but it might lead to more globally-optimized solutions. If only looking at one decision after the other, the risk is very high to come up with a local optimum, constrained by earlier decisions.
I agree with starting the COW PR in "In Progress" or "In Discussion" (still don't understand the difference)
"In Progress" means that also the solution space is clear. I clarified in 640830e290edc80937666e5a3d21f8565ca002bd
will find new edge cases when we actually start testing the change tracking stuff.
It is a relatively easy algorithm, at least compared to assignment of keys to backends. I already implemented a similar algorithm twice (kdbSet and logchange). If we forget about edge cases, we will have to live with them, it is not possible to fix that afterwards. E.g. the edge case to change a value and then change it back is with our current kdbSet algorithm simply unfixable broken. This is why careful decisions are of utmost importance.
Another task important to get the decisions done is to write use cases for Elektra's APIs. I don't know anyone more eminently suitable than @kodebach. I added the task to the top list.
Please also add the label "decisions" to everything related to decisions. [decisions] in the title is also nice (as this is the subject in E-Mail notifications).
Another task important to get the decisions done is to write use cases for Elektra's APIs. I don't know anyone more eminently suitable than @kodebach. I added the task to the top list.
Please be a bit more specific. What exactly do you want here?
Something similar to doc/usecases/record_elektra but for applications and plugins using Elektra's API. More or less a summary of requirements/usecases that LCDproc or some plugins (you wrote) had.
I hope this gets wider contributions, as this has intersections to basically anyone working with Elektra. In particular I hope that @lawli3t will also contribute. In retrospection this should have actually have been done before his RQ2.
Something similar to doc/usecases/record_elektra but for applications and plugins using Elektra's API.
I think you yourself would be much better suited for this. You definitely know of more actual real world uses of Elektra than I do. I'd just be making things up for how I think Elektra should be used, not what actual requirements are.
I can write a bit about LCDproc, but I doubt that would match most use cases. LCDproc already had an existing config format. Things are much different, if you design your app's config with Elektra in mind from the start.
If you write a 1-2 use cases about LCDproc it is already much better for our common understanding. Tools, Plugins, Bindings etc. can also be added later. I created #4686.
I think for #4678, we first need to decide about read-only keys. It doesn't make sense to make them COW if we actually want them read-only anyway.
Who wants to do this decision? It is hopefully a short one.
Well as I'm blocked by it I guess I can also create this decision ...
Thank you!
There were long discussions in #2202 if we need to unlock key names after a Key was removed from keysets. I don't think you need to read it. Once we established read-only semantics, we also have an answer to this problem.
There seems to be a circular dependency in the decisions. The "read-only key name" needs COW (as otherwise keyDup would be too expensive to always use for a name change) and COW needs the "read-only key name" to finalize its structs.
Well COW doesn't reaaaally need this. Updating the COW implementation afterwards should be pretty easy.
But updating the buffer structs #4683 is difficult?
What exactly are you referring to? How can we update something that is not even implemented?
It doesn't make sense to make them COW
COW is still useful for saving memory when a copy is needed and change tracking definitely needs a copy.
The "read-only key name" needs COW (as otherwise
keyDupwould be too expensive to always use for a name change) and COW needs the "read-only key name" to finalize its structs.
But updating the buffer structs https://github.com/ElektraInitiative/libelektra/pull/4683 is difficult?
No idea what you're trying to say here... IMO "read only keys", "COW" and "buffer structs" are all independent of each other.
Also please be more specific what "read only keys" would even be about. There is the secondary counter to automatically lock/unlock key names, but that is a totally separate issue not related to anything else.
Otherwise, IIRC the only recent mention of "read-only keys" was as a solution for "internal cache". By returning all keys from kdbGet with KEY_FLAG_RO_VALUE set, the simple solution I proposed would be a complete solution. This would be an alternative to using COW in "internal cache", but not a separate decision.
COW is still useful for saving memory when a copy is needed and change tracking definitely needs a copy.
Exactly. Even if we have read-only names, the implementation of copy-on-write wouldn't change. We'd still store the name seperately from the key with a reference counter, because it saves memory. Otherwise, we'd have to deep-copy the name everytime we copy a key.
So in conclusion, whether key names are read-only or not has 0 impact on the implementation of COW.
I think for https://github.com/ElektraInitiative/libelektra/pull/4678, we first need to decide about read-only keys.
One addition: @markus2330 Please stop suggestion more and more new decisions. At some point there has to be an end. You put "proposal timeline Elektra 1.0" on the agenda for the next meeting and based on the current progress, my opinion is: Elektra 1.0 will be released in 2030 at the earliest. There is no chance of a 1.0 any time soon, unless we quickly bring all the decisions to an end and start with implementations. At least not, if you want to avoid having 2.0 a year after 1.0.
No need for a response, please. This is just my opinion and I had to say it, I don't want to start another pointless discussion.
So in conclusion, whether key names are read-only or not has 0 impact on the implementation of COW.
Ok, thank you for starting the decision anyway.
In general #4678 is already more detailed than required. There is always gray area what essential semantics vs. details is. Great job @atmaxinger!
After we started clarification of the name, IMHO more about semantics of (shared) meta-data would be beneficial. I suggest that we create a decision about that, too (even if it might not influence COW implementation). There are several implicit requirements flying around in various issues not clearly summarized and documented yet.
At some point there has to be an end.
Yes, after essential semantics are clarified. The topics are there, if we want or not. The only question is if we tackle them consciously and explicit --- or if everyone implements according his/her own ideas with a huge risk of inconsistency and subtle bugs nearly impossible to fix (without reimplementations/refactoring).
I am not as pessimistic as you are. I looked over all decisions and I think nearly everything is clear and finished. Unclear topics are a very small minority now.
The only bigger problem, next to spec and change tracking, I see is the "second error channel". A new error concept would be very challenging. So probably, if you (@kodebach) are determined to do this, we should start with it, as it has (next to read-only name) the biggest uncertainty and largest potential of different viewpoints. I am also okay with dropping the idea and having a special solution only for a second error channel for key names. (Which actually is the only main topic where a second error channel is really needed.)
Use cases are also very urgent. There are already discussions if there are use cases of e.g. later changing key names, without us actually have documented use cases. These are dangerous dead ends where one can easily waste lots of time for things that would have been obvious if we would correctly start from the beginning (writing use cases).
I updated the priorities in the top post.
Volunteers for use cases and meta data are very much welcomed.
Brilliant, I ask you to suggest fewer decisions so we can make some progress, and your response is suggesting another decision. I'm not even going to really respond to the rest All I'll say is: You seem to have very concrete ideas about what should be documented ("implicit requirements flying around", "use cases"), maybe you should start some documentation yourself.
Some short responses for the rest:
COW is not on the list in the top post. Even if you consider it done (although the PR is still open), it should be on the list, but checked.
"second error channel"
Lowest priority. Not really needed. IMHO we already have to granular errors in many cases, e.g. ksGetSize or keyIsBelow. Often the errors only get in the way and nobody actually cares, because they won't happen anyway.
Use cases are also very urgent
I fail to see how high-level use cases would help with all the API and internal structure questions we have right now. Therefore I don't understand why it would be so urgent.
It is my responsibility that students I supervise do something useful in the context of scientific work, which requires as precondition:
- use cases/requirements
- architectural decisions
- research questions
Otherwise it is not a process that can lead to a finished master thesis. So I try to avoid that you implement/evaluate something without doing the preconditions first, whatever students might say or even whine. And obviously it is not my responsibility to do the work of the students I supervise, even though I help as far as I can.
COW is not on the list in the top post.
:+1: Added the link above.
Lowest priority. Not really needed.
I agree, but then we should update the decision how an API user can distinguish locked/read only vs. wrong names, which is a real problem. There are many easy solutions to it but it is an important API decision, as this case cannot be covered by the current error concept. Maybe we look at doc/decisions/0b_rejected/separate_key_name.md again? (Which would clearly separate these two error situations. It might also help to get clear semantics for read only keys.)
I fail to see how high-level
I didn't say they must be high-level. They can be e.g. about that Keys get embedded in user-specific data structures (without KeySet), or, if you find such a use case, a situation where you need to change the key name of a Key because you need to avoid a keyDup (I currently fail to see this use case, though).
use cases would help with all the API and internal structure questions we have right now. Therefore I don't understand why it would be so urgent.
Doing use cases first simply is the right order. First you determine what you want to do (use cases), then how you want to do it (decisions) and then you do it. (And repeat this in iterations, hopefully not so often if the steps were done well.) How will you design an API if it is unclear what it is used for?
As we are working together, we need coordination and a common clear picture of use cases.
It is my responsibility that students I supervise do something useful in the context of scientific work, which requires as precondition: [...]
What you did not mention here is that, if a task is supposed to be progress toward a thesis, the task needs to be related to the thesis topic. I would have no issue, if you suggested something that is related to somebodies thesis topic. However, "metadata semantics" don't fit any thesis topic I currently know of.
My main gripe, however, is that you always suggest new task as prerequisites for current tasks. Were basically doing a "one step forward two steps back" dance. If your intention is to lead people to a finished thesis, then there must also be progress.
obviously it is not my responsibility to do the work of the students I supervise
Sure, but Elektra is also a FLOSS project and not just a research/teaching project. If tasks don't fit with any thesis topic or lecture objective, I see no reason why you couldn't contribute too.
we should update the decision how an API user can distinguish locked/read only vs. wrong names, which is a real problem
I'll open a PR, but I've never really considered this a problem. In many cases the caller knows that a key name cannot be locked, e.g. if they just created the key with keyNew or keyDup, or successfully called another function that requires a writable name. In the other cases, you can just call keyIsLocked when you get an error to check what the problem was.
In the same vein, I don't see a need to return an error code for NULL values, when there is a good default. Here ksGetSize is an example. In most cases, you know you don't have a NULL pointer or 0 is on NULL is acceptable, e.g.
for (elektraCursor it = 0; it < ksGetSize (ks); it++) // works even if ks == NULL and ksGetSize returns 0
In the rare case where you might have a NULL and need to know, you can just do if (ks == NULL) yourself.
I didn't say they must be high-level
IMHO a use-case is high-level by definition. Even if it is a use-case for an API like keyDup, one use-case IMO would just be: I as a user have a Key (from a KeySet) and want to duplicate it, because I want to insert it into a different KeySet with an updated value (without affecting the original Key).
You also said "Something similar to doc/usecases/record_elektra" and those files are pretty high-level.
If you had something different in mind, please please provide some examples, since AFAICT nobody really knows what you expect. We only know that you seemingly have something vey specific in mind.
As we are working together, we need coordination and a common clear picture of use cases.
Currently, I don't consider us (you and I) as working together. IMO you are delegating work to me (and other people) and/or supervising the work. But I do get the point...
Doing use cases first simply is the right order. [...] How will you design an API if it is unclear what it is used for?
In principle I agree. For example, for kdb record it was absolutely correct to first define the use cases and then go from there. kdb record is completely new it makes sense.
But for things that are not new, it makes less sense. For example kdbGet/kdbSet (re. operational sequences) are already in use. A decision can just use the existing documentation for them as a base. We don't need individual use cases, because the documentation already describes what is possible, i.e. more than specific use cases.
And than there are things like metadata or keyname semantics (i.e. read-only), where I don't see how a use case could help. Because I don't understand what a use case would even be in those cases.
@mpranj I took the store_name.md decision from you, because I'm working on a slightly broader decision for key names. It is also heavily related to the public API and the necessary migration and therefore will fit very well with my thesis.
I mark this stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping by writing a message here or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart: