libelektra icon indicating copy to clipboard operation
libelektra copied to clipboard

hide preloaded keys in ksLookup (optimization)

Open markus2330 opened this issue 7 years ago • 14 comments

Idea: allow skipping keys in ksLookup using metadata skip.

this way we could avoid returning unwanted keys to the user (preloaded keys to allow mmap/opmphm optimizations)

markus2330 avatar Nov 09 '18 15:11 markus2330

I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

stale[bot] avatar Apr 11 '21 18:04 stale[bot]

I closed this issue now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

stale[bot] avatar Apr 25 '21 19:04 stale[bot]

Probably better to use metadata skip for this. This we can also introduce post-1.0.

markus2330 avatar May 01 '21 06:05 markus2330

I don't know why a skip metakey would be useful here. Whether or not returning binary keys from ksLookup makes sense or not, is determined by the code surrounding ksLookup, so an option to ksLookup would be obvious solution.

IMO this is solved by a simply adding a KDB_O_BINARY so only ksLookup (ks, lookup, KDB_O_BINARY) can return binary keys. Basically, we would add

if (!(options & KDB_O_BINARY) && keyIsBinary (ret)) return NULL;

somewhere towards the end of ksLookup.

kodebach avatar Jul 01 '21 20:07 kodebach

I don't know why a skip metakey would be useful here.

So that you can pass a pre-loaded KeySet to kdbGet which are not meant to be found by later ksLookup (all of which have the skip flag). The reason to do something like this is that the opmphm will not need to be rebuild if these keys are added later (e.g. by gopts).

markus2330 avatar Jul 03 '21 16:07 markus2330

I see what the idea was now. Adding skip metadata would be a solution for both OPMPHM pre-initialization and avoiding unwanted binary keys. However, to me this seems a bit like abusing a solution and like a bit of a hack.

IMO it would be better to control whether binary keys are returned via a KDB_O_BINARY option and use a different solution for OPMPHM, as this would be more clear to new users. I also think that adding dummy keys just to pre-build the hashmap is not a good solution. AFAIK the hashmap only contains keynames (char *) not full keys (Key *). So IMO the better solution would be adding a ksBuildHashmap (KeySet * ks, const char ** extraNames) function that builds the hashmap with keys currently in ks plus all the keynames in extraNames. We would also need an extra check in ksAppendKey to ensure the hashmap is only invalidated when needed, but that shouldn't be a problem.

In short, I'm not a fan of overly generic solution just for the sake of it. Sometimes generic solutions are a good thing (e.g. backend plugin), but often they are not. A generic solution is also often a lot harder to maintain in the long term, because you don't really know how people are using it. If you build a solution for exactly one problem, you only need to maintain it for that specific use case.

kodebach avatar Jul 04 '21 20:07 kodebach

There seems to be a confusion: This use case and issue is only about skipping preloaded keys (added to keep OPMPHM intact), I changed the title.

I do not see a use case to avoid looking up binary keys, they should already be rejected by a validation plugin (if they are unwanted).

markus2330 avatar Jul 05 '21 15:07 markus2330

I don't think a validation plugin could cover 100% of the use cases, of a KDB_O_BINARY option (e.g. shared mountpoints where one client needs binary, but the other doesn't support it). But application code can always check whether the returned key is binary and ignore it.

Separately from that, I think my ksBuildHashmap idea wouldn't work. However, I do still think a metakey skip is the wrong approach for an optimization like this. One reason, is that it might be abused as stated already. Another is that metakey is quite heavy-weight compared to e.g. a simple flag like KEY_SYNC, especially if the key has no other metadata and therefore wouldn't have a meta keyset.

kodebach avatar Jul 05 '21 17:07 kodebach

Maybe "skip" is not a good name. The general idea is to hide keys, so probably we should call it:

  1. hidden
  2. invisible
  3. lookup/hide
  4. lookup/disable

I don't like the idea to ignore based on any other metadata, it will make the lookup process too confusing. Only hidden keys should be hidden.

markus2330 avatar Jul 16 '21 07:07 markus2330

I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

stale[bot] avatar Jul 30 '22 23:07 stale[bot]

@mpranj do you see this part of your work? These hidden keys could be used as pre-loaded "proc:/" keys so we do not need to discard the OPMPHM if proc adds a few keys.

markus2330 avatar Aug 01 '22 10:08 markus2330

These hidden keys could be used as pre-loaded "proc:/" keys so we do not need to discard the OPMPHM if proc adds a few keys.

You'd have to know the keys that could be added in advance. From a separation of concerns POV, only gopts could add those keys.

Also as stated above, I think the solution here is simply to add a ksBuildHashmap to preload the hashmap. Hidden keys within the KeySet are a gigantic hack IMHO. They just cause weird issues. For example, if they are only hidden in a ksLookup, a simple iteration via ksAtCursor would still show them. Also ksSize would return weird results etc.....

kodebach avatar Aug 01 '22 14:08 kodebach

You'd have to know the keys that could be added in advance. From a separation of concerns POV, only gopts could add those keys.

Yes, in this proposal gopts would always add all keys, hiding the keys that are currently not used.

Also as stated above, I think the solution here is simply to add a ksBuildHashmap to preload the hashmap.

The question is how performant the different solutions are. I assumed building up the hash map is expensive. If this is not true, the proposal does not make sense. Obviously, first we need to do a benchmark.

For example, if they are only hidden in a ksLookup, a simple iteration via ksAtCursor would still show them. Also ksSize would return weird results etc.....

It is not wrong, these keys are simply hidden in the same way as (non-hidden) proc: keys might hide user:, system: etc. keys.

Btw. the hiding feature makes the current namespace feature obsolete. So it does not add complexity for the user.

markus2330 avatar Aug 02 '22 11:08 markus2330

Yes, in this proposal gopts would always add all keys, hiding the keys that are currently not used.

That's not fully possible (because of arrays), but could be done to some extent. In any case it would require changes in libelektra-opts not in gopts. Also depending on the config (e.g. something like kdb with many subcommands) we would really waste memory. gopts/libelektra-opts would create loads and loads of unnecessary keys.

Even if this worked, I'm not sure how it would help. In almost all cases, gopts will create the same keys every time it is called within one process. It is rare that processes change their environment variables at runtime and even rarer that they change their command-line options. If gopts is invoked via a contract (the recommended way), you'll also have to do a new kdbOpen and change the contract to change what gopts returns.

The question is how performant the different solutions are. I assumed building up the hash map is expensive.

IIRC from the original benchmarks, building the hash map is kinda expensive yes. But your "hidden keys" solution would only delay the cost until the first lookup. Which in most cases will happen right after kdbGet (because otherwise why call kdbGet).

In any case, all of this is speculation and premature optimization (as you like to say) without any benchmarks. Adding a lot more keys, because maybe they are added later certainly makes building the hashmap more expensive (not to mention the obvious memory use). To do this properly, we would need to know how likely it is that a key is added later.

It is not wrong, these keys are simply hidden in the same way as (non-hidden) proc: keys might hide user:, system: etc. keys.

Sorry, but no that is totally different. proc: keys don't hide user: keys. If I do ksLookupByName (ks, "user:/foo", 0) I still find the key, even if a proc:/foo exists too.

If you want your meta:/hide to only apply to cascading lookups then that could break even more things. Yes, applications are only supposed to use cascading lookups. But sometimes (e.g. in tools like kdb) you need access to a key of a specific namespace. If meta:/hide only hides in cascading lookups, you'd have to check for meta:/hide every time you do e.g. ksLookupByName (ks, "user:/foo", 0).

Btw. the hiding feature makes the current namespace feature obsolete. So it does not add complexity for the user.

Not sure how this would make anything easier?

  • If meta:/namespace/# is removed, the user would have to add separate meta:/hide keys, which I'd say is more complex.
  • If meta:/namespace/# in the spec is still used by users and is just translated by spec into meta:/hide keys, then yes nothing changes for the user, but meta:/namespace also isn't obsolete.

Also, meta:/namespace/# allows changing the lookup order (not a good thing IMO, but it is what it is). With meta:/hide you couldn't do that.

kodebach avatar Aug 02 '22 12:08 kodebach

Lets keep this for post-1.0 as it is a pure optimization.

markus2330 avatar Nov 18 '22 09:11 markus2330

Since it is a user-visible change, I wouldn't call it "pure optimization" (some users may have a real use case for it), however, it doesn't change any existing APIs or semantics (only adds new stuff), so post-1.0 is fine.

kodebach avatar Nov 18 '22 09:11 kodebach