libelektra icon indicating copy to clipboard operation
libelektra copied to clipboard

add const to KeySet argument of ksLookup/ksSearch

Open markus2330 opened this issue 3 years ago • 10 comments

AFAICT these are the reasons why we can't use const:

  1. We may build the hashmap
  2. KDB_O_POP could remove a key and KDB_O_CREATE could create a key
  3. We may create a key from meta:/default
  4. We update the internal cursor

AFAIK cases 2-4 will all be removed soon. IMO in cases like this one, a const variant could be useful to avoid the ksDup. The variant never builds the hashmap and just falls back to binary search, if the hashmap wasn't built yet.

Originally posted by @kodebach in https://github.com/ElektraInitiative/libelektra/pull/4471#discussion_r974138545


From the C Standard section 6.7.3:

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined

Seems pretty clear to me.

Since this is part of our public API we should really do things properly and not go against the standard without good reason. Here we don't know where the KeySet originally comes from, so casting a const KeySet * to a KeySet * is definitely not safe.

Originally posted by @kodebach in https://github.com/ElektraInitiative/libelektra/pull/4471#discussion_r974273332

markus2330 avatar Sep 19 '22 15:09 markus2330

I think building the hashmap is okay as it logically does not change the KeySet. So I am in favor of adding const to normal ksLookup/ksSearch, and not adding variants.

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined

We do not need to make such an attempt: We can internally do a cast.

Here we don't know where the KeySet originally comes from

We know it, as our functions are the only functions that create a KeySet. We do not allow the KeySet to be constructed in any other way.

If an alternative implementation really has KeySets lying in const memory regions, this implementation obviously couldn't have OPMPH rebuilds within ksLookup. I think this limitation is not very big to such alternative implementations and we can easily live with it. A much nicer user interface is more important than such theoretical limitations of alternative implementations.

@lawli3t What is your stance on this?

markus2330 avatar Sep 19 '22 15:09 markus2330

We do not need to make such an attempt: We can internally do a cast.

That's exactly what the standard is talking about. The object (KeySet) was defined as const-qualified (const KeySet *) but is now accessed through a non-const qualified type (KeySet *).

We know it, as our functions are the only functions that create a KeySet.

Even with this constraint it's not certain. You could first create a KeySet * and then use something like mprotect to make the memory read-only. One would expect that it would still be possible to call const KeySet * APIs. This example is without a doubt highly hypothetical, but if we want our public API to follow the standard it should be possible.

A much nicer user interface is more important than such theoretical limitations of alternative implementations.

If we really accept such cases, we should also change keyMeta (Key * key) to keyMeta (const Key * key), because that is actually the more annoying example of const vs non-const IMHO.

kodebach avatar Sep 19 '22 16:09 kodebach

That's exactly what the standard is talking about. The object (KeySet) was defined as const-qualified (const KeySet *) but is now accessed through a non-const qualified type (KeySet *).

Interesting interpretation. I don't see how a cast does a modification of the object (as written in the sentence you quote). There are plenty of functions in the C standard that need to do such casts. It would be very weird if only C++ (with const_cast) is able to do such casts and you are not able to implement libc with conforming C code? Anyway, at least in C++ we could write a conforming implementation and the API design is across all languages.

You could first create a KeySet * and then use something like mprotect to make the memory read-only.

No you cannot as we do not give the information which memory regions we use, where the OPMPHM lives etc. And if somebody is needing such things, he/she will be able to compile Elektra with ENABLE_OPTIMIZATIONS turned off. (As ENABLE_OPTIMIZATIONS gets useless if the OPMPHM is not allowed to be build.) But even then he/she has no guarantee that ksLookup will not write to the heap, as memory allocations might occur in protected regions.

If we really accept such cases, we should also change keyMeta (Key * key) to keyMeta (const Key * key), because that is actually the more annoying example of const vs non-const IMHO.

No, we cannot as keyMeta returns a non-const modifier which allows logical manipulations of the Key (indirectly). This is clearly against the spirit of const.

The reason why ksLookup is non-const is because of 2-4, it never was because of 1.

markus2330 avatar Sep 19 '22 16:09 markus2330

I don't see how a cast does a modification of the object

The cast doesn't the cast is what creates the "lvalue with non-const-qualified type". But adding the hashmap to the resulting KeySet * certainly is an "attempt to modify the object".

There are plenty of functions in the C standard that need to do such casts.

AFAIK those are only functions like strchr which e.g. take a const char * and return a char *, but do not actually modify the data. AFAIK these functions should actually return a const char *, but that would be an ABI breaking change. (see https://stackoverflow.com/a/55232244)

we do not give the information which memory regions we use

The application could simply make all it's current memory pages read-only and if it later does need writable memory use new/different pages. By looking at e.g. /proc/self/maps the application can at least know which memory pages Elektra used to allocate the KeySet.

Also this is still not an argument. Even if the application only makes the page where the KeySet * points to read-only, the expectation would be that a const KeySet * API doesn't write to that memory.

No, we cannot as keyMeta returns a non-const modifier which allows logical manipulations of the Key (indirectly). This is clearly against the spirit of const.

That's true. But there could be a const KeySet * keyMetaConst (const Key * key) for when you just need read-only access to the metadata. That one could even return NULL, if there is no metadata keyset. (Note: I'd put that into libelektra-operations since it is definitely not minimal API).

kodebach avatar Sep 19 '22 17:09 kodebach

The cast doesn't the cast is what creates the "lvalue with non-const-qualified type". But adding the hashmap to the resulting KeySet * certainly is an "attempt to modify the object".

Yes, but that would be in an different statement/expression and your quote would not apply anymore. But as said: it does not matter as we don't have as design goal that a standard-conforming implementation in C is possible. (In fact it is not already, as we have a function <-> non-function cast in our module loader.)

Also this is still not an argument. Even if the application only makes the page where the KeySet * points to read-only, the expectation would be that a const KeySet * API doesn't write to that memory.

This is probably differences in expectations if you look at it from the C or C++ side. IMHO const has nothing to do with read-only memory but is a type qualifier giving promises what our function will do with the data. And logically we keep the KeySet const, there is no observable difference.

But there could be a const KeySet * keyMetaConst (const Key * key)

I think this is exaggerated const correctness for C. (In C++ you actually would offer something like this, otherwise you would not be able to get meta data from a const Key without a nasty const_cast.)

markus2330 avatar Sep 19 '22 17:09 markus2330

Yes, but that would be in an different statement/expression and your quote would not apply anymore.

I'm not sure what you want to say here. The way I read the standard is:

"If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined"

"An object defined with a const-qualified type"

const Foo * foo1 = createFoo();

"an attempt is made to modify [...] through use of an lvalue with non-const-qualified type"

foo2->bar = 2;

Now if foo2 and foo1 refer to the same object (e.g. because of a foo2 = (Foo*) foo1) then this is undefined behaviour.

IMO the standard (very correctly) doesn't how you got a non-const object that was actually defined as const (maybe you just happend to guess the memory address correctly). The only thing that matters is, if you attempt modify such an object it is UB.

In fact it is not already, as we have a function <-> non-function cast in our module loader.

That is true, but IMO that's different. There simply is no standard way of having a void * equivalent for function pointers. But const vs non-const in the public API has some very real implications IMO.

My main concern was never really standard compliance. I brought it up, because that is normally something you care about. My concern is simply that having a public API that takes a const pointer, but still modifies the data. That is simply bad design in my mind.

Maybe another reason could convince you. If I share some struct Foo between two threads, it should be fine to call myFunc(const Foo * foo) in both threads without synchronisation, because myFunc should only read foo. But if myFunc ignores the const and modifies foo you'll get data races.

kodebach avatar Sep 19 '22 18:09 kodebach

The only thing that matters is, if you attempt modify such an object it is UB.

Is this also the case if the object initially wasn't const? (As is always the case in our situation.)

Or as example, you argue that char * x = ..; strchr(x, ..) = y; is UB?

If I share some struct Foo between two threads

We clearly say that you need to duplicate Elektra's data structures in order to use them in different threads. So you are proposing additional thread safety we did not have before here?

markus2330 avatar Sep 19 '22 18:09 markus2330

Is this also the case if the object initially wasn't const? (As is always the case in our situation.)

I'm not sure. "an object defined with a const-qualified type" could be read as "defined as const at some point", not necessarily "defined initially as const". But as I already said, my main concern never really was standard compliance.

It's quite hard to say what the standard actually wants to express here. Even more so since AFAIK no compiler actually exploits this in any way.

you argue that char * x = ..; strchr(x, ..).. = y; is UB?

No that's a compile error, because strchr(x, ..) is not assignable. That's why the standard explicitly states "lvalue".

But this would be fine, since x is not const:

char * x = strdup("abc");
*strchr(x, 'b') = y;

We clearly say that you need to duplicate Elektra's data structures in order to use them in different threads. So you are proposing additional thread safety we did not have before here?

I intentionally used struct Foo and not KeySet for this example. I wasn't suggesting to make Elektra more thread safe, I was explaining that const vs not-const in a public API implies certain things to the consumer of the API, chiefly that the data behind the pointer won't be modified.


If we do use const KeySet * ks in ksLookup, we should make it very explicit in the documentation that the internal data of ks could in fact be modified, but only in a way that is not visible via the public API.

However, this leads to the next questions: Which of these should we use?

Key * ksLookup (const KeySet * ks, Key * key, elektraLookupFlags options);
const Key * ksLookup (const KeySet * ks, Key * key, elektraLookupFlags options);

Key * ksLookup (const KeySet * ks, const Key * key, elektraLookupFlags options);
const Key * ksLookup (const KeySet * ks, const Key * key, elektraLookupFlags options);

I guess the "correct" way for const-ness of the return value would be to follow that of ks, since that's where the Key * comes from. But that's inconvenient for callers, so we should probably just follow the libc examples like strchr.

The Key * key argument is a different question. It is not const because it might be modified during ksLookup. But that is just an optimization to avoid allocating another key and after ksLookup returns key should be the same as before. So I'd argue it should be const just like ks.

IMHO all this discussion could easily be avoided by just having

// in libelektra-core
Key * ksLookup (KeySet * ks, Key * key, elektraLookupFlags options);

// (maybe) in libelektra-operations
// doesn't build hashmap, instead falls back to binary search if hashmap is missing
// doesn't modify key, instead uses keyDup if needed
const Key * ksLookupConst (const KeySet * ks, const Key * key, elektraLookupFlags options) {
    return ksLookup ((KeySet *) ks, (Key *) key, options | KDB_O_CONST);
}

kodebach avatar Sep 19 '22 18:09 kodebach

No that's a compile error, because strchr(x, ..) is not assignable. That's why the standard explicitly states "lvalue".

I thought it is obvious that I wanted to show that strchr must do some kind of cast internally which can be used to modify the (non-const) data.

I intentionally used struct Foo and not KeySet for this example. I wasn't suggesting to make Elektra more thread safe, I was explaining that const vs not-const in a public API implies certain things to the consumer of the API, chiefly that the data behind the pointer won't be modified.

No, I don't think that this is implied in any way. It is probably a similar confusion as that volatile has something to do with thread safety.

If we do use const KeySet * ks in ksLookup, we should make it very explicit in the documentation that the internal data of ks could in fact be modified, but only in a way that is not visible via the public API.

Yes, we should be very explicit that our implementation does not allow the uses of mprotect or concurrent access in threads on the same data structure. Other implementations might allow that.

Key * ksLookup (const KeySet * ks, const Key * key, elektraLookupFlags options);

:+1:

IMHO all this discussion could easily be avoided by just having [...] KDB_O_CONST

No, as this would obscure what we actually want to achieve. If we want ksLookup to be thread-safe, we should call it KDB_O_THREAD_SAFE or similar. But I don't think we should add such a feature in the near future.

markus2330 avatar Sep 21 '22 16:09 markus2330

const [...] implies certain things to the consumer of the API, chiefly that the data behind the pointer won't be modified.

No, I don't think that this is implied in any way.

Okay, so what does const imply to you?

Yes, we should be very explicit that our implementation does not allow the uses of mprotect or concurrent access in threads on the same data structure.

I don't think this is explicit enough. I think we must explicitly state: "This function will cast the provided const Keyset * ks to Keyset * ks and const Key * key to Key * key. Both ks and key may be modified. Any modification made is purely internal and does not affect the outcome of any public API functions called after this functions returns."

What can and cannot be done by the caller follows from this. A similar disclaimer must be added to all other public API functions that do something similar.

IMHO all this discussion could easily be avoided by just having [...] KDB_O_CONST

No, as this would obscure what we actually want to achieve. If we want ksLookup to be thread-safe, we should call it KDB_O_THREAD_SAFE or similar. But I don't think we should add such a feature in the near future.

The point wasn't that KDB_O_CONST is the solution (although a private option and an internal helper might be), but that I think having two separate functions that mostly share an implementation would be better.

kodebach avatar Sep 21 '22 22:09 kodebach

Okay, so what does const imply to you?

That is a very general question which would lead this discussion totally astray. There are many books and papers about that topic (OOP). For const Key and const KeySet I already gave a short answer much earlier: it is a guarantee to the user that we do not change something that is observable via the API. Caching, Lazyness etc. is okay (e.g. we could also calculate the name when elektraKeyName(const Key*) is invoked the first time.) But I agree guarantees (or the lack thereof) should be documented.

"This function will cast the provided const Keyset * ks to Keyset * ks and const Key * key to Key * key. Both ks and key may be modified. Any modification made is purely internal and does not affect the outcome of any public API functions called after this functions returns."

Sounds good, can you please add what we discussed here to the docu? (mprotect, thread safety, etc.)

markus2330 avatar Sep 22 '22 18:09 markus2330

That is a very general question

In my mind the question has a very simple answer. const in C means read-only, i.e. data that cannot be modified.

The internet seems to agree with me. See also the Notes in the description of C++ const_cast. There is even some mentions that the compiler can decide to store const objects in read-only memory.

However, I also agree that our case is a grey area. Since a KeySet is always heap-allocated and it is always libelektra-core that allocates a KeySet with malloc, it should™ be okay to treat all KeySets as writeable memory, as long as we clarify that we do this.

can you please add what we discussed here to the docu?

I have no idea what I should write or where... I think it would be much simpler, if you add the docs yourself since you seem to have and expectation of what should be added.

For me it would be enough to add the 3 sentences I suggested to the doxygen comment, when we change ksLookup to take const arguments. And add similar doxygen comments, when we update/create other such APIs.

kodebach avatar Sep 23 '22 11:09 kodebach

In my mind the question has a very simple answer. const in C means read-only, i.e. data that cannot be modified.

Of course it is like this. The relevant question is from which viewpoint it is read-only. The OOP answer usually is that the viewpoint is the interface from the outside, i.e., the API. In a quick look I didn't find anything in your links which disagrees.

Btw. it is not something new I propose here, Elektra uses this guidelines as long as I can remember. E.g. we also used the iterators of const members as needed ... as long as we rewind the cursor to the same position it was before it was also considered as const. But luckily we get rid of this problem by removing the internal iterator.

I have no idea what I should write or where... I think it would be much simpler, if you add the docs yourself since you seem to have and expectation of what should be added.

For you it is simpler :wink: Ok, I'll see what I can do, I added it to #4453

markus2330 avatar Sep 23 '22 13:09 markus2330

The relevant question is from which viewpoint it is read-only.

I was pretty clear on that IMO. The data is read-only, and my links basically agree there. For a const int you can't change the integer, for a const struct Foo you can't change the values of the fields in struct Foo.

The OOP answer usually is that the viewpoint is the interface from the outside, i.e., the API. In a quick look I didn't find anything in your links which disagrees.

C is not an OOP language, so I don't see that as a valid answer to the question: "What does const imply in a C type?" IMO that's also why the (C-specific) links don't explicitly disagree. Although I'd say they don't agree either.

No need for a reply, just wanted to clarify...

kodebach avatar Sep 23 '22 14:09 kodebach

I was pretty clear on that IMO. The data is read-only, and my links basically agree there. For a const int you can't change the integer, for a const struct Foo you can't change the values of the fields in struct Foo.

Irrelevant here as we are discussing const usage for parameters.

C is not an OOP language, so I don't see that as a valid answer to the question

https://doc.libelektra.org/api/latest/html/ clearly says on the front page that Key, KeySet and KDB are considered to be classes. This is already the case since the first commit in Elektra's repo, on Sat May 29 11:46:26 2004 (KDB was called Registry back then.)

No need for a reply, just wanted to clarify...

ditto

markus2330 avatar Sep 24 '22 10:09 markus2330