Sparse index is not dropped by updates that don't include index members
Firstly, thanks a bunch for this great library. I've worked with DynamoDB for a few years now, but it's only using ElectroDB that I feel like its full capabilities can be used without getting bogged down in boilerplate.
Describe the bug
The condition predicate for a sparse index is not evaluated for update calls (update(), patch(), or upsert()) that don't include all members of the index.
This means that if the condition is independent of the composite index members, then it is quite easy to write an update call that you would expect to result in the index being dropped, but does not. This is best described by the example in this playground link. In summary:
- The
byNameindex is sparse on the conditionstatus !== "deleted". - When an update call sets
status: "deleted", this condition is not evaluated, and the index is not dropped. - Bringing the
byNameindex "into scope" by providing its composite fields to the.composite()call (commented out in the playground) causes the condition to be evaluated and the index to be dropped as expected.
Expected behavior
The index should be dropped when an update call sets status: "deleted".
I think that there is probably no practical way to change this behaviour, as it would require evaluating all condition predicates on every update. I expect that many/most condition implementations assume that the index members values will be available, so evaluation of these conditions in a different context would break a lot of existing code.
Short of fixing/changing the behaviour, is there a way to improve the documentation to help developers form a better mental model about how this works? Perhaps something along the lines of:
The condition callback is only evaluated for updates that either involve all index members, or where the index members are provided using composite().
I also found this misleading in the documentation for condition, which implies that the condition is evaluated for every mutation:
A function that accepts all attributes provided every mutation method and returns a boolean value.
Additional context Related: https://github.com/tywalch/electrodb/issues/366
Bringing in some relevant discussion from https://github.com/tywalch/electrodb/issues/366 that I glossed over on first read. It looks like this limitation has already been discussed and understood, although I think that the caveat 1 mentioned here isn't well captured in the current documentation.
Aside from fixing this at the docs, I think there is also a backwards-compatible way to fully resolve this by introduction of a new conditionDependencies array (pending a better name), as described by @sam3d:
Add a list of dependent attributes to condition (like with watch on attributes) so we know when to re-compute the presence of the index. This would retain the current flexibility and fix index behaviour on mutations, with some added complexity for the entity index definition
Here's how I picture this working:
- This is an optional field, defaulting to
[]. If you want to write aconditionthat is dependent on a particular non index member being present, you can choose to include its name in this array. By doing this you are guaranteed that it will be present when yourconditionis evaluated. - When validating update operations, ElectroDB will require that either all or none of the following attributes for sparse indexes are present in the query (via either
set()orcomposite()):[...pk.composite, ...sk.composite, ...conditionDependencies] - If some, but not all of those fields are present, a runtime error is thrown.
- If none of the fields are present, the
conditionis not evaluated and the composite index is not changed by the query. - If all of the fields are present, then the
conditionwill be evaluated and the index will be updated or dropped based on the outcome.
This is really only a small extension of the current behaviour and is done so in a backwards-compatible way: The check that was introduced to fix caveat 2 raised here would still apply in the default case of conditionDependencies: [], however if a developer wants to add dependencies to their condition beyond the composite index members, they can use conditionDependencies to extend that check to include additional fields. This gives a nice guarantee that you won't be able to make the mistake in my original playground link without encountering an error.
Here's an updated playground link showing how I picture a developer would use this.
Hi @dabrowne 👋
Thank you for writing this and for suggesting a thoughtful and thorough solution.
Firstly, to your original challenge, would adding status to your index composite be a viable solution for you? Given what you've shared, it would have the same impact on your use case but work well with the current implementation.
On the topic of conditionDependencies, I like what you have written up! Candidly, the condition callback has become a bit of an issue magnet, and I think it's because it exposes the user to a lot more of the implementation of index management (typically something they don't want and why they're using the library in the first place). These challenges aren't electrodb specific, but finding the right abstraction has been.
My gut concern with conditionDependencies is that it adds more complexity to something that already has me questioning whether it's the right fit for the library. For example, I've been having another conversation in Discord about removing the requirement for all attributes and putting the onus on users to check if necessary attributes are present. Another concern is that there should be three states allowed from condition: write, drop, and skip.
I triage issues based on the degree to which users can implement the solution today, with prejudice for maintaining a small surface area to prevent user churn. I have been mulling over alternatives and improvements to this API but haven't quite had an ah-ha moment yet. Issues like yours have been super helpful, though, to understand the challenges users see. It's even more valuable to see how they'd solve them (as you have)!
Please let me know what you think about the above. Please don't take my concerns as the end of the conversation; the more you keep the conversation going, the closer we'll likely get to a solution here 💪
Hi @tywalch
Thanks for taking the time to take a look at this.
Firstly, to your original challenge, would adding status to your index composite be a viable solution for you?
Yes, I believe that would be a reasonable work-around. That's a good idea!
My gut concern with conditionDependencies is that it adds more complexity to something that already has me questioning whether it's the right fit for the library. For example, I've been having another conversation in Discord about removing the requirement for all attributes and putting the onus on users to check if necessary attributes are present.
I understand where you're coming from from the perspective of complexity and broken abstractions. Here's my unsolicited opinion about where complexity belongs or doesn't:
The developer who sets up an ElectroDB entity needs to be an expert to some extent. They should have read the documentation and have a decent understanding of how DynamoDB and indexes work. However, the value of the ElectroDB entity is that you only need to consider that complexity once. Once the entity is set up, the expert (and other non-expert developers) can then use the abstraction that's been created without needing to hold all of that complexity in their head. That's the primary value of an abstraction.
To the extent necessary, ElectroDB should also provide guard rails around places where the abstraction is imperfect. One such example is the "Missing Composite Attribute" error received when an update doesn't include complete enough information to keep a composite index in sync. IMO the conditionDependencies concept sits in the same bucket: it acknowledges that the abstraction can't be perfect and provides a way for an expert to set up guardrails that avoid mistakes down the line.
I guess it's the presence of those guardrails that is the subject of your debate. Personally I think they're a really big part of the value that ElectroDB gives developers. But I also agree that not everybody wants them, and they should be able to be switched off. For example, someone may choose to maintain their composite indexes via DynamoDB streams + Lambda handler. In this case, the "Missing Composite Attribute" check would get in their way because they are happy for their indexes to be momentarily inconsistent.
Another concern is that there should be three states allowed from condition: write, drop, and skip.
Yeah, this is another way of approaching the same problem:
If you can guarantee that all dependencies are available for the condition callback, you only need write and drop. However if there is a chance that some value needed in condition might not be present, then you need a third result skip or indeterminate. I also considered this as a solution but figured that it's less palatable as it represents a breaking API change. It also doesn't let the developer express a condition where a certain value is required and should result in an error if not present.
This is quite an interesting read if you haven't come across it before: https://aosabook.org/en/v2/sqlalchemy.html
Not suggesting that there is anything specifically that applies directly to ElectroDB, but I think the discussion in the first few paras about the challenges of building database abstractions is quite interesting.