Revisit Cachex API naming and return structures
Elixir has a naming convention:
Functions that return a boolean are named with a trailing question mark.
But I noticed that several Cachex functions, with a trailing ?, don't return booleans. Instead they return booleans in an ok-tuple. Ie. {:ok, false}
I haven't checked every module, but in the main module I'm talking about Cachex.empty?/2 and Cachex.exists?/3.
In tests with non-existing cache, this results in false positives/negatives with assertions like these:
assert Cachex.exists?(:my_cache, :some_key)
refute Cachex.empty?(:my_cache)
I would suggest either returning plain booleans or removing the question-marks in the function name. Or even implement both flavours. I'd also be happy to submit a PR going for either solution, but first would like to hear others opinion on this.
Looking forward to your reactions!
Hi @michielfbr!
I originally wrote a huge reply to this but scrapped it to be more respectful of your time 😅. It's still fairly long, but I appreciate your thoughts on any of this! I broke it up into responding to your comments + my own, to try make it easier to follow.
Response
I am well aware of the problem; it was a hot topic on Twitter recently. Unfortunately, Twitter is quite simply the worst medium possible for actually having a conversation, so I'm glad you brought it to GitHub! FWIW it was polled there (not by me, so unbiased!) and the result was that it should not be changed.
Having said that, I am totally willing to find common ground here - but there are concerns to address. What I want is a solution that takes the users into account. It's very easy to point at convention and say "your library sucks, fix this", but context does matter.
I noticed that several Cachex functions, with a trailing ?, don't return boolean
The core reason I chose ? is that they do return booleans. The documented API contract of Cachex is that every function in the main module returns { :ok, return_value }. Special casing only these functions impacts consistency, and would be a case of convention > usability. The fact that nobody cared about this to file an issue until now supports this, IMO.
In tests with non-existing cache, this results in false positives/negatives with assertions like these
That being said, you are correct that it can be misleading in the case of assert or things like if, etc. IMO this is a result of people caring about conventions for the wrong reasons. Conventions exist to show intent, they are not a replacement for things like documentation and types. Reminder that "convention" is defined as "a way in which something is usually done". If the Elixir team wanted it to be always, they should have the compiler enforce it.
I would suggest either returning plain booleans or removing the question-marks in the function name. Or even implement both flavours
Yeah, these are the candidates I looked at so far. I originally wrote a full explanation of this, but it was largely pointless so I can summarize it:
- Changing the type introduces inconsistency across the API
- Removing the
?changes the meaning; we need new names- Naming matters far more than convention, I don't think anyone disagrees with this
- Something like
Cachex.empty/2reads as "empty the cache" not "is the cache empty" - A new name needs to fit the existing API design, e.g. no
Cachex.does_cache_have_entries/2
- Doing both is probably off the table, unfortunately
- There are people who care about convention, and those who don't
- Nobody is advocating for the current API, they just don't care about convention
- So doing both is redundant, while also introducing inconsistency across the API
Instead, rather than arguing about booleans specifically, I think we should be talking about revisiting the API convention in general. Maybe people didn't want to say "change all of it", but that's more compelling than just special casing booleans.
Proposal
I wrote the first version of the library almost a decade ago; Elixir was very new and not widely used. Nobody cared about convention, because the language was still growing. At the same time the approach I chose was also not based on the current Elixir audience, so maybe it's wrong in 2024! I defined the API because there is a case where a cache does not exist. So something like this:
Cachex.get(:missing_cache, "key")
I did not believe that crashing was the best approach here (I still don't, but I could maybe be convinced). So simply because of this, every call had to support { :error, reason } at a minimum for this case. This is where the ? boolean functions broke, all these years ago :)
So perhaps in 2024 this is simply the time to go through the API and do the following:
- Replace the missing cache error with a crash
- Remove the error cases for all infallible functions (i.e. boolean functions)
- Remove the
{:ok, return_value}wrapping for functions where appropriate (some calls may still make sense to have this)
For the last point, I don't really like not having :ok tuples in some cases. Being able to eagerly handle your found value is much better (IMO). Consider these two examples:
# removing the {:ok, _} syntax on fetch
case Cachex.fetch(:cache, "key", &String.length/1) do
{:error, reason} ->
# handle error during fetch
{:commit, value} ->
# handle newly generated values
value ->
# handle values already found
end
# keeping the {:ok, _} syntax on fetch
case Cachex.fetch(:cache, "key", &String.length/1) do
{:ok, value} ->
# handle values already found
{:commit, value} ->
# handle newly generated values
{:error, reason} ->
# handle error during fetch
end
Even in this case though, :ok is vague and kinda sucks. There's probably a better option once it's free of trying to be consistent with the rest of the calls. Side effect of these changes is being able to scrap the nonsense of all the ! generation. It was ugly even 10 years ago!
The problem with this is that the churn is quite high; it will affect literally every function in the Cachex module. For those who don't care about convention (and there's a lot of those people), this is just noise for no benefit. How to balance that is a difficult question. Do I immediately follow with a v5.x after a v4.x just last week? Most likely not.
I'm curious on your thoughts here, so please do let me know!
As an aside, the conventions you linked would also have me rename Cachex.fetch/4, but I have no interest in doing so. It would also dictate that we label everything with ! because they all have the ability to crash.
Part of the slippery slope of conventions is that you should probably care about them all, or you might as well care about none.
Hi @whitfin !
Thank you for sharing your thoughts so extensively! I'm glad that I am not the first noticing this, thanks for mentioning. I totally get the objections against simply changing the return of only a couple functions, ignoring the defined API contract, just to match conventions.
I'll move straight to responding to your proposal, trying to keep everything you mentioned in mind. But not before saying that "your library sucks, fix this" surely isn't the message I want to convey. On the contrary.
The 3 changes you list, sound like a good approach to me.
Replace the missing cache error with a crash
Perhaps. Not entirely sure on non-! functions. Maybe return a self-defined exception-struct?
In case of crashing/raising, there might be use for something like Cachex.running?(:my_cache). For cases with un-supervised cache, where one would like to handle the case of a not-running cache.
Remove the error cases for all infallible functions (i.e. boolean functions)
Yes :-)
Remove the {:ok, return_value} wrapping for functions where appropriate (some calls may still make sense to have this)
Yes, but definitely not everywhere. As you say: "only where appropriate"
In general I would say write-to-cache functions would always keep the tuple. Somewhat like Ecto.Repo.
Cachex.fetch() has the potential for writing, so should keep the tuple.
And, although I feel some resistance on bang functions (!), I do think they could be usable next to certain functions, but not all of them.
I realize that any of these changes have major impact on many existing users. Thus I totally get that, if applied, changes like these should be well overthought. And won't happen overnight in a minor patch.
I started using Cachex and noticed these issues as well.
I'm not a huge fan of redundant return values such as { :ok, true }. I understand it's to be consistent with cases where an error is returned, but I don't see that need. Returning :ok | {:error, :no_cache} is perfectly acceptable.
@SergioInToronto those are simply remnants of back in the day where standards were less solidly shaped, and because people are used to it that way (and it's harmless) it has never been changed.
This whole debate is very a much a case of how much do we care about a couple of small naming issues. Enough to force every user of the library to have to audit and migrate their code? For me, I still don't believe that's entirely practical.
My stance on this thread is largely going to be that this can be addressed when some other factors force major (breaking) changes. I don't see myself pushing breaking changes and migration documentation only to change names of a couple of things. Beyond anything else, it's a lot of effort for something that isn't stopping anyone being productive.
This isn't final though, I will be spending some time on this and seeing how much of a lift it is. In general I do dislike ! functions, and some of the API changes here would also eliminate the need for those -- which is quite a nice side effect. There also is some value to having these types of changes as the only things to review during a major upgrade, instead of potentially missing them amongst other major changes.
Elixir has come a long, long way since I first built much of this API (which was Elixir 1.2) and standards have changed underneath. I do acknowledge that we need a new lick of paint at some point :)
Also, for anyone coming to this in the meantime, everyone has been talking about booleans and ? functions. I'm curious if there are other things, or if that's literally all any of this is about!
This issue could be called "Modernize API" since most conventions didn't exist when Cachex was originally written, as you explained nicely. Maybe add "and improve UX" or "and polish rough edges" for additional changes, for example raising an error when cache is missing, or tweaking the return of fetch/4 as you described.
I'm curious if there are other things
For me, raising an error on missing cache so that return values can be simplified is highly desirable. Following Elixir function conventions is next most important to me. High level, I'd like using Cachex to be as frictionless as possible.
Cachex works really well - I was happy once I got it all set up. Devs who are new to Elixir (like myself) might stumble over the convention-breaking patterns and therefore struggle to get started. On the flip side, releasing a new major version requires existing projects to update and it's not valuable (no new features), it's just busywork in a sense. The way I think about it: should we prefer helping new users or making it easy for existing users. There's good arguments on both sides.
One argument, Elixir keeps gaining popularity. StackOverflow saw Elixir grow from 1.74% in 2021 to 2.1% in 2024. It was the 2nd most admired language trailing only Rust. That's incredible! I assume Elixir's popularity will continue growing, maybe even accelerating.
I lean towards making changes as described in this issue and releasing a new major version. Now is better than later. New users will benefit and existing users aren't forced to upgrade. Maybe we'll backport features/bugfixes to the old major version for a while. Maybe. But it adds work for you, which may not be feasible now. Or it may only be feasible now, while you're still active. Folks like @michielfbr, @HansGlimmerfors, and myself might like to help. Whether we do this or not, I support you. As I said there's good arguments on both sides. But this decision primarily comes down to you and maybe only to you.
Just following up to this with my latest thoughts here:
I am intending on (time permitting) a v5.0 which will address most of these issues based on the naming conventions.
I plan to replace the initial "no cache" failure case with an ArgumentError (which I think fits best, because the "cache you selected does not exist"):
- This simplifies the return types for many functions
- This allows us to drop
!forms of many (all?) functions - This allows
?functions to be pure booleans, rather than tuples - This allows using
:okortrueover{ :ok, true }etc.
Currently (subject to change, maybe), I am not planning on renaming any functions. The notes are mainly documenting stdlib types, and I believe e.g. renaming Cachex.fetch/4 to Cachex.get_lazy/4 crosses the line into being more churn than it's worth.
If we do opt for this, it's a smaller lift to deprecate and promote new names going forward without breaking changes. As such my gut feel is to hold off here until we have a larger community pull for this. This area is a little awkward, because the naming of these functions actually pre-date the (documented) conventions.
@michielfbr and @SergioInToronto if you have any feedback here, fire away!
- raise
ArgumentError, awesome! - v5, awesome 🎉
- I think we should rename functions + change return values at the same time. For example:
@spec clear(Cachex.t(), Keyword.t()) :: {status, integer}becomes@spec clear!(Cachex.t(), Keyword.t()) :: integer
I plan on removing the ! versions wherever possible, based on this snippet from the Elixir conventions:
Errors that come from invalid argument types, or similar, must always raise regardless if the function has a bang or not. In such cases, the exception is often an ArgumentError or a detailed FunctionClauseError:
By using ArgumentError for invalid cache names, we can avoid ! on many (all?) of the signatures. Resulting in:
@spec clear(Cachex.t(), Keyword.t()) :: integer
There may be others which are fallible for other reasons, and those will still have both forms (i.e. some of those which hit disk, etc). This should make most signatures very simple and drop a lot of !, and only a few will require a ! form.
Ah that makes sense - thanks for clarifying.
In the docs there's mention of Cachex.stream!/2 and Cachex.get!/2 but I don't see those functions. I assume this is a doc bug? I can clean this up while also writing the v5 docs.
If we'll move forward with v5, next step might be to define and review all function @specs
Cachex.get returning {:ok, nil} for missing values should be also fixed, as it makes it impossible to store valid nil values.
@a3kov how would you suggest changing that without causing the same issue with some other value?
For example, you can return :not_found - not {:ok, :not_found}, but literally :not_found
If the code can return errors, it could return {:error, :not_found}
Right -- but that then just means it's not possible to store ':not_found'.
There will always be some clash, but 'nil' is pretty traditionally used to represent "no value". The 'nil' itself comes from Erlang's implementation, not my own personal choice, so this is unlikely to change.
but that then just means it's not possible to store ':not_found'.
It is possible, because the value in my suggestion is separated from the wrapper:
:not_found -> missing record
{:ok, :not_found} -> somebody cached :not_found value
Same with :error wrapper:
{:error, :not_found} is different from {:ok, :not_found}
'nil' is pretty traditionally used to represent "no value"
Right, and "no value" is a valid value to cache. The meaning comes from the API consumer, not the caching API itself. If ets can handle it, then it shouldn't be disallowed.
Ah, @a3kov, I see what you're saying.
Given the scope of the changes that people want (i.e. no more return tuples), how would you represent a missing value here? It seems contradictory to the other requested changes.
I'm okay with returning e.g. :not_found and having that special cased, but that is fundamentally no different to nil, except that you're arguably less likely to be storing :not_found than nil.
Would you rather raise an exception if the key is missing? Maybe raise unless there's a :default option passed to Cachex.get/3?
Either way I do think this is something we can try to improve.
Would you rather raise an exception if the key is missing?
As far as I see there's no reason to do that. Absence of cached value is not an exceptional situation - quite the opposite.
The general rule for cases like this - a library that deals with user data should just wrap it, instead of dictating what can and can't be passed, unless there's really good reason behind why a specific value can't be used. Clearly Cachex can store nil values so the rest is just a minor detail - how you wrap it. And that is not very important, as long as you don't confuse the value and the wrapper. Also that's why in Elixir we have {:ok, value} tuples. Ask yourself: with this value and this wrapper can I pattern match the result clearly ?
As far as I see there's no reason to do that. Absence of cached value is not an exceptional situation - quite the opposite.
For what it's worth I also totally disagree with exceptions, I was more trying to make a point!
Ask yourself: with this value and this wrapper can I pattern match the result clearly ?
Right, I understand the point -- but contextually one of the main goals of this thread and related discussions is getting to a place where we can inline calls and values, e.g.
if Cachex.get(:my_cache, "my_boolean") do
...
And such wrapping (like we currently have!) stops this from being possible, which is why there would be a shift to nil. I agree that this has downsides, as you're pointing out, but wrapping will directly contradict requests from others.
With all of this being said it's worth noting that precedence for this behaviour exists in Elixir's own stdlib (which is why Cachex inherited it all those years ago). As such I don't think it should be considered something to be "fixed", rather just an area we can try to improve.
the rest is just a minor detail - how you wrap it
You're right; this is effectively just an issue with get/3 conflating nil with entry(value: nil). After glancing over everything, basically everywhere else correctly has a distinction here.
Referring back to https://github.com/whitfin/cachex/issues/382#issuecomment-2956496352, the current plan for 5.x would leave Cachex.get/3 roughly equivalent to Map.get/3. I am not yet sure on adding a new third parameter to match Map.get/3 (thus making Cachex.get/4) or whether it'll be a :default option in the current option list.
This has precedence in the stdlib, and is intuitive enough. Anyone who suffers from this choice can opt for a different default to nil if they wish to store nil as a value. Honestly the big failing here is that the default is not already configurable, IMO.
We could choose to use some other atom instead of nil, but this would break the inlining approach due to Elixir's choice of evaluation -- and I believe that most people would prefer nil be used here. The nil value traditionally (although yes, not always) represents absence of value.
I don't really wanna go back and forth on here too much as this thread is much more general than this signature, so I'll try a few things out and keep them in mind when it comes to the API shape for v5.0! I've reopened https://github.com/whitfin/cachex/issues/425 and will cross reference as needed.
Hi everyone!
Just a note to say that work has begun on the directions specified in this thread in a separate branch; once things are functional, it'll be merged into main and we'll be officially on a track to releasing a v5 (and can pull in all other 5.x issues, too!).
I can't promise when it'll be ready, but I feel that this thread has been left open for far too long so I'll be carving some weekends out to get some traction here. My hope is by end of 2025, but again that can shift at any point.
Thank you all again for your feedback!
After starting a lot of the work for the main changes in https://github.com/whitfin/cachex/pull/426, it's clear that there are already spaces to improve before a 5.x is out. For the most part, things are a lot simpler and do make a lot more sense. It's trivial to do things like this:
cache
|> Cachex.get(key)
|> handle_result()
But there are certain cases which I think have worsened. In particular there are cases where with is now a pain to use, because there is no single success case:
with {:ok, value} <- Cachex.fetch(cache, key, &function/1) do
...
end
This has resulted in much bloat:
case Cachex.fetch(cache, key, &function/1) do
{:ignore, value} ->
...
{:commit, value} ->
...
{:error, _reason} ->
...
value ->
...
end
I do not want to special case these functions to keep :ok labels; I feel like that's a mistake. To resolve the case above for fetch/4 specifically, I believe the solution is to simplify the result to optionally drop the details:
# also considering detail: false rather than simple: true
case Cachex.fetch(cache, key, &function/1, simple: true) do
{:error, value} ->
...
value ->
...
end
This is closer to being acceptable, but still not ideal. Separately to this I am considering a generic option for cache calls (and maybe on a cache itself) to label any non-error state in a tagged tuple. For example passing label: :ok would transform your result into {:ok, result}. Combining these things would give us:
with {:ok, value} <- Cachex.fetch(cache, key, &function/1, simple: true, label: ok) do
...
end
At a glance this feels a bit gross, but when you consider that you can enable it on a cache level it actually becomes quite nice for those who prefer it. It might be better to avoid label: :ok in favour of a boolean flag e.g. something like flow: true.
One of the reasons I like this is that it also provides a migration path to 5.x; you could enable this option and see no API difference compared to 4.x. As you migrate your calls, you can add label: nil to drop the binding and adopt the 5.x API.
I am unsure which of these, or if neither, will be included in 5.x -- I'm just trying a few things out and thought I'd try get some feedback!
But there are certain cases which I think have worsened. In particular there are cases where
withis now a pain to use, because there is no single success case:with {:ok, value} <- Cachex.fetch(cache, key, &function/1) do ... end
This has resulted in much bloat:
case Cachex.fetch(cache, key, &function/1) do {:ignore, value} -> ...
{:commit, value} -> ...
{:error, _reason} -> ...
value -> ... end
As an alternative to complicating the usage of Cachex.fetch/4 by adding the options to optionally return an ok-tuple, do you think it would make sense to change the return type from {:ok | :ignore | :commit | :error, any()} to {:ok, {:ignore | :commit | :found, any()}} | {:error, any()}?
Maybe that could be a compromise for those who only care about success/failure and those who do care about the status?