electrodb icon indicating copy to clipboard operation
electrodb copied to clipboard

Sparse index is not dropped by updates that don't include index members

Open dabrowne opened this issue 10 months ago • 4 comments

Firstly, thanks a bunch for this great library. I've worked with DynamoDB for a few years now, but it's only using ElectroDB that I feel like its full capabilities can be used without getting bogged down in boilerplate.

Describe the bug The condition predicate for a sparse index is not evaluated for update calls (update(), patch(), or upsert()) that don't include all members of the index.

This means that if the condition is independent of the composite index members, then it is quite easy to write an update call that you would expect to result in the index being dropped, but does not. This is best described by the example in this playground link. In summary:

  • The byName index is sparse on the condition status !== "deleted".
  • When an update call sets status: "deleted", this condition is not evaluated, and the index is not dropped.
  • Bringing the byName index "into scope" by providing its composite fields to the .composite() call (commented out in the playground) causes the condition to be evaluated and the index to be dropped as expected.

Expected behavior The index should be dropped when an update call sets status: "deleted".

I think that there is probably no practical way to change this behaviour, as it would require evaluating all condition predicates on every update. I expect that many/most condition implementations assume that the index members values will be available, so evaluation of these conditions in a different context would break a lot of existing code.

Short of fixing/changing the behaviour, is there a way to improve the documentation to help developers form a better mental model about how this works? Perhaps something along the lines of:

The condition callback is only evaluated for updates that either involve all index members, or where the index members are provided using composite().

I also found this misleading in the documentation for condition, which implies that the condition is evaluated for every mutation:

A function that accepts all attributes provided every mutation method and returns a boolean value.

Additional context Related: https://github.com/tywalch/electrodb/issues/366

dabrowne avatar Feb 21 '25 08:02 dabrowne

Bringing in some relevant discussion from https://github.com/tywalch/electrodb/issues/366 that I glossed over on first read. It looks like this limitation has already been discussed and understood, although I think that the caveat 1 mentioned here isn't well captured in the current documentation.

Aside from fixing this at the docs, I think there is also a backwards-compatible way to fully resolve this by introduction of a new conditionDependencies array (pending a better name), as described by @sam3d:

Add a list of dependent attributes to condition (like with watch on attributes) so we know when to re-compute the presence of the index. This would retain the current flexibility and fix index behaviour on mutations, with some added complexity for the entity index definition

Here's how I picture this working:

  1. This is an optional field, defaulting to []. If you want to write a condition that is dependent on a particular non index member being present, you can choose to include its name in this array. By doing this you are guaranteed that it will be present when your condition is evaluated.
  2. When validating update operations, ElectroDB will require that either all or none of the following attributes for sparse indexes are present in the query (via either set() or composite()):
    [...pk.composite, ...sk.composite, ...conditionDependencies]
    
  3. If some, but not all of those fields are present, a runtime error is thrown.
  4. If none of the fields are present, the condition is not evaluated and the composite index is not changed by the query.
  5. If all of the fields are present, then the condition will be evaluated and the index will be updated or dropped based on the outcome.

This is really only a small extension of the current behaviour and is done so in a backwards-compatible way: The check that was introduced to fix caveat 2 raised here would still apply in the default case of conditionDependencies: [], however if a developer wants to add dependencies to their condition beyond the composite index members, they can use conditionDependencies to extend that check to include additional fields. This gives a nice guarantee that you won't be able to make the mistake in my original playground link without encountering an error.

Here's an updated playground link showing how I picture a developer would use this.

dabrowne avatar Feb 21 '25 23:02 dabrowne

Hi @dabrowne 👋

Thank you for writing this and for suggesting a thoughtful and thorough solution.

Firstly, to your original challenge, would adding status to your index composite be a viable solution for you? Given what you've shared, it would have the same impact on your use case but work well with the current implementation.

On the topic of conditionDependencies, I like what you have written up! Candidly, the condition callback has become a bit of an issue magnet, and I think it's because it exposes the user to a lot more of the implementation of index management (typically something they don't want and why they're using the library in the first place). These challenges aren't electrodb specific, but finding the right abstraction has been.

My gut concern with conditionDependencies is that it adds more complexity to something that already has me questioning whether it's the right fit for the library. For example, I've been having another conversation in Discord about removing the requirement for all attributes and putting the onus on users to check if necessary attributes are present. Another concern is that there should be three states allowed from condition: write, drop, and skip.

I triage issues based on the degree to which users can implement the solution today, with prejudice for maintaining a small surface area to prevent user churn. I have been mulling over alternatives and improvements to this API but haven't quite had an ah-ha moment yet. Issues like yours have been super helpful, though, to understand the challenges users see. It's even more valuable to see how they'd solve them (as you have)!

Please let me know what you think about the above. Please don't take my concerns as the end of the conversation; the more you keep the conversation going, the closer we'll likely get to a solution here 💪

tywalch avatar Feb 25 '25 16:02 tywalch

Hi @tywalch

Thanks for taking the time to take a look at this.

Firstly, to your original challenge, would adding status to your index composite be a viable solution for you?

Yes, I believe that would be a reasonable work-around. That's a good idea!

My gut concern with conditionDependencies is that it adds more complexity to something that already has me questioning whether it's the right fit for the library. For example, I've been having another conversation in Discord about removing the requirement for all attributes and putting the onus on users to check if necessary attributes are present.

I understand where you're coming from from the perspective of complexity and broken abstractions. Here's my unsolicited opinion about where complexity belongs or doesn't:

The developer who sets up an ElectroDB entity needs to be an expert to some extent. They should have read the documentation and have a decent understanding of how DynamoDB and indexes work. However, the value of the ElectroDB entity is that you only need to consider that complexity once. Once the entity is set up, the expert (and other non-expert developers) can then use the abstraction that's been created without needing to hold all of that complexity in their head. That's the primary value of an abstraction.

To the extent necessary, ElectroDB should also provide guard rails around places where the abstraction is imperfect. One such example is the "Missing Composite Attribute" error received when an update doesn't include complete enough information to keep a composite index in sync. IMO the conditionDependencies concept sits in the same bucket: it acknowledges that the abstraction can't be perfect and provides a way for an expert to set up guardrails that avoid mistakes down the line.

I guess it's the presence of those guardrails that is the subject of your debate. Personally I think they're a really big part of the value that ElectroDB gives developers. But I also agree that not everybody wants them, and they should be able to be switched off. For example, someone may choose to maintain their composite indexes via DynamoDB streams + Lambda handler. In this case, the "Missing Composite Attribute" check would get in their way because they are happy for their indexes to be momentarily inconsistent.

Another concern is that there should be three states allowed from condition: write, drop, and skip.

Yeah, this is another way of approaching the same problem: If you can guarantee that all dependencies are available for the condition callback, you only need write and drop. However if there is a chance that some value needed in condition might not be present, then you need a third result skip or indeterminate. I also considered this as a solution but figured that it's less palatable as it represents a breaking API change. It also doesn't let the developer express a condition where a certain value is required and should result in an error if not present.

dabrowne avatar Feb 26 '25 02:02 dabrowne

This is quite an interesting read if you haven't come across it before: https://aosabook.org/en/v2/sqlalchemy.html

Not suggesting that there is anything specifically that applies directly to ElectroDB, but I think the discussion in the first few paras about the challenges of building database abstractions is quite interesting.

dabrowne avatar Feb 28 '25 01:02 dabrowne