efcore icon indicating copy to clipboard operation
efcore copied to clipboard

[Cosmos DB] Support a default value for non-nullable properties

Open thomaslevesque opened this issue 5 years ago • 23 comments
trafficstars

Suppose I have an entity like this:

public class Todo
{
    public string Id { get; set; }
    public string Title { get; set; }
    public bool IsDone { get; set; }
}

If the container has a document with no value defined for IsDone, trying to load it throws an InvalidOperationException: Nullable object must have a value.

Although I understand the reason for this, it means my code can never handle documents where IsDone is null or undefined (which can happen, for instance, if the property didn't exist in the beginning but was added later). I could make the property nullable in the entity, but then I would need to handle the null case everywhere the property is used, which isn't ideal.

Also, if I query with a filter like .Where(t => !t.IsDone), documents where the IsDone property is undefined are not returned. I understand that it's because of the Cosmos DB semantics (null or undefined is neither true nor false), but it would be nice if null or undefined could be handled as if it was a default value (false in this case). See also https://github.com/Azure/azure-cosmos-dotnet-v3/issues/682.

I think this could easily be handled by supporting a default value. This value could either be specified with a [DefaultValue] attribute on the property, or with a HasDefaultValue(T value) method in the fluent model builder API. The effect would be to change the SQL from NOT(doc['IsDone']) to NOT(doc['IsDone'] ?? <defaultValue>).

I think it would be a very valuable feature to make data model evolutions easier. Currently, if I want to add a property when the container already contains documents, I have to either make it nullable (which isn't ideal if it's not supposed to be null) or update all documents to add the property.

thomaslevesque avatar May 21 '20 15:05 thomaslevesque

@AndriySvyryd Duplicate of #13131, but this was split into https://github.com/dotnet/EntityFramework.Docs/issues/1712, https://github.com/dotnet/efcore/issues/17722, and https://github.com/dotnet/efcore/issues/17746. I'm not which of these matches this particular request.

ajcvickers avatar May 22 '20 20:05 ajcvickers

@ajcvickers We can keep this one separate then as none of the split ones match it.

AndriySvyryd avatar May 22 '20 20:05 AndriySvyryd

This is very much needed, however, you can get most of the way there if you don't need to query using the property by doing something like this:

public class Todo
{
    public string Id { get; set; }
    public string Title { get; set; }
    public bool? isDone { get; set; }

    [NotMapped]
    public bool IsDone
    {
        get { return this.isDone.HasValue && this.isDone.Value; }
        set { this.isDone == value; }
    }
}

That is to say, make it nullable, but make a wrapper around it that defaults the null to false when read. You get the safety of having a default value for when the document doesn't contain the value and you can still deserialize the type from cosmos without throwing exceptions.

steveevers avatar Apr 07 '21 22:04 steveevers

@steveevers In general it works, but there is one caveat. It doesn't work if you want to apply query projection which is very important with regards to query performance (to query specific props rather than everything). In this case it doesn't even execute property getter and crashes with the same error ('Nullable object must have a value.'). Example: .Select(t => new TodoProjection { IsDone = t.IsDone }) Any ideas how to workaround this?

vyarymovych avatar Jan 20 '22 17:01 vyarymovych

Any update on this feature? It's very annoying to add new non-nullable fields.

dbalikhin avatar Jun 15 '22 21:06 dbalikhin

@dbalikhin This issue is in the Backlog milestone. This means that it is not planned for the next release (EF Core 7.0). We will re-assess the backlog following the this release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources. Make sure to vote (👍) for this issue if it is important to you.

ajcvickers avatar Jun 16 '22 16:06 ajcvickers

Thanks @ajcvickers. I don't think EF Core is even usable with CosmosDB without this feature. Unless you don't need to add new fields in the future. But that's the point of unstructured data storage. New fields defined as nullable: cannot properly use a.HasValue / a.Value in query, because of null vs undefined. New fields defined as non-nullable: cannot properly retrieve items, because this field can be undefined. No migrations to set the default value on field creation. More like CosmosDB limitations of course.

dbalikhin avatar Jun 16 '22 16:06 dbalikhin

Okay, so it doesn't like nullable.HasValue and nullable.Value, but it works with "nullable.FieldName condition X". So nullable.IsDone == true or nullable.IsDone == null. Stick with nullable for now...

dbalikhin avatar Jun 16 '22 21:06 dbalikhin

One option for providing a default value is to use a ValueConverter with convertsNulls set to true. An example for a boolean that defaults to false when undefined or null.

  public class UndefinedDefaultConverter : ValueConverter<bool, bool?>
  {
#pragma warning disable EF1001 // Internal EF Core API usage.
    public UndefinedDefaultConverter() : base(
      v => v,
      v => v ?? false,
      convertsNulls: true
    )
#pragma warning restore EF1001 // Internal EF Core API usage.
    { }
  }

Note, this doesn't help with querying since IsDefined() isn't handled by EF. I haven't found a way other than writing your own query to properly check for IS_DEFINED(), null, or the default value.

rcmosher avatar Aug 23 '22 13:08 rcmosher

@AndriySvyryd #13131 doesn't really explain what EF core does or should do to support this use case (supporting non-nullable values by allowing defaults) https://github.com/dotnet/efcore/issues/17722 and https://github.com/dotnet/efcore/issues/17746 also don't directly enable this.

Is there not a simple way to allow EFCore to let the natural c# way of initializing defaults occur without error? i.e.

public class Todo
{
    public string Id { get; set; }
    public string Title { get; set; }
    public bool IsDone { get; set; } = false; 
}

In the case where you need to query by these values i.e. _context.ToDo.Where(td => td.IsDone), any values who aren't there should be treated just as cosmos handles it, but itself would be a separate responsibility than the one saying that you can't have nulls. I was looking at adding the IS_DEFINED function into EF.Functions but still can't connect how this will actually enable the big problem of EFCore not handling gracefully new values

mnguyen36 avatar Feb 25 '23 06:02 mnguyen36

Running into this problem now. Can't believe a Microsoft product doesn't support automatic property generation that's baked into the C# lang... :/

public bool IsDone { get; set; } = false;

This would be ideal. If the Document doesn't contain the property, it will be automatically populated, and later when I save changes it will be created in the document.

Otherwise I have to go through and update every document to add a new property (cumbersome).

ghost avatar Feb 26 '23 21:02 ghost

@mnguyen36 https://github.com/dotnet/efcore/issues/17722 would only provide a workaround for some use cases. To support a more natural use we would need to design the API to configure it. We can't use the value that the property is initialized to as it's not easily accessible when we are building the model

AndriySvyryd avatar Feb 28 '23 04:02 AndriySvyryd

IMO, this issue can be handled by a migration script which runs over the existing documents and gives them their default value. In your case you should do something similar to:

UPDATE ToDo
SET IsDone = false
WHERE IsDone is null

Now you just have to put this into NoSQL and run this via a custom migration script. Maybe take a look at my Kapok.Module.MigrationEngine. But it might be confusing since it uses an abstraction layer to EF Core putting everything into data domains ...

leo-schick avatar Apr 12 '23 14:04 leo-schick

Except CosmosDB doesn't support the UPDATE statement, so you have to fetch each document and update it. Or you could write a stored procedure to do it (which is not trivial due to the callback-based async API).

Anyway, migrating the documents is fine when you have just a few documents, but when you have millions, it's not really a good option.

thomaslevesque avatar Apr 12 '23 15:04 thomaslevesque

@thomaslevesque that is true. In that case I would suggest to look into big data tools like Azure Databricks which has a Azure Cosmos DB Connector which supports read/write operations. Combined with a collection (=index) on the IsDone property and maybe a two step migration (adding the property as nullalbe, removing nullable after the processing is done) you should get to where you want to be...

Anyways, you have to make a trade off between one time migration or bad reading performance...

I don't think that someone from the Microsoft team will implement this. It looks to be more business logic than ORM logic to me ... and EF Core is the ORM ...

The fact is, that your documents just don't have the new property ... and therefore you either should add it to them or you have to deal with that fact that the property is sometimes not there (= null).

I see this as quite error prone to implement and the reason to implement this seems to me quite weak ...

For example, how would you make sure that EF Core makes sure this works with joins or pushdowns? (that is what happens when a Linq query gets send over to the database for execution) Then you have to travel through the whole Linq expression and extend the query with an OR operator or ISNULL / COALESCE scala operation (-> which decreases the query performance)

leo-schick avatar Apr 12 '23 18:04 leo-schick

Oh, I'm not saying it would be easy to do. Just that it would be useful and would make a pretty common scenario much easier.

Anyway, I don't use Cosmos DB anymore, so it doesn't really matter to me whether it's implemented or not.

thomaslevesque avatar Apr 12 '23 19:04 thomaslevesque

Anyway, I don't use Cosmos DB anymore, so it doesn't really matter to me whether it's implemented or not.

This! +1

I suspect they addressed this issue by introducing CosmosDB for PostgreSQL. It can scale and it is relational.

dbalikhin avatar Apr 12 '23 19:04 dbalikhin

Might be better to add something like this by allowing the user to set JSON serializer options and add custom serializers. Then it would cover almost all scenarios. Its almost impossible to use EF Core and Cosmos DB together in production without the ability to use a custom serializer. There's too many compromises you must make in your schema to make it viable.

NickSt avatar Aug 19 '23 21:08 NickSt

@NickSt It would be really helpful if you could outline the compromises you have had to make.

ajcvickers avatar Aug 20 '23 08:08 ajcvickers

so sad

wangengzheng avatar Jul 09 '24 06:07 wangengzheng

+1 to this. We just ran into this issue as well.

maschall avatar Aug 05 '24 15:08 maschall

+1 what the hell

garrynewman avatar Aug 21 '24 13:08 garrynewman

So many things wrong here:

  • The error text is wrong! The problem is that a non-nullable value is being assigned null, not that 'nullable needs value'. This alone cost me many hours.
  • It doesnt say what property is the problem. I'm querying millions of entries with hundreds of properties. I have no idea which is the issue. I dont even have a way to see what the failing entry is.

Its just flabbergasting that global internet infrastructure like this can be broken for one of the most basic use cases, for years, AND have a information-free + plain wrong error.

johnkslg avatar Aug 23 '24 16:08 johnkslg

Design decisions:

  • We'll try to do this for 10; when reading missing JSON properties, we'll return the .NET default value for the property type.
  • This covers only materialization of missing properties, and does not cover any query changes to affect the database-side evaluation of missing propertiews (e.g. the Cosmos behavior of filtering out undefined).
  • We'll implement this for both Cosmos and relational JSON. Other providers are encouraged to do the same (/cc @damieng)
  • For now, we will neither implement an opt out of this behavior, nor allow the value returned to be customized by the user. We recognize that there may be some cases where users may want to turn this off, e.g. to prevent data corruption where the model doesn't correspond to the JSON data (even if EF doesn't typically try to compensate for model inaccuracies). If we see user demand for such features, we'll consider implementing them.
    • If we do end up needing to provide customizability, a solution proposed by @AndriySvyryd would be to reuse HasDefaultValue() for this (i.e. promote it from relational to core). It would thus control both the write side when the entity provided is missing the value, and the read side when the database representation is missing the value.
  • This makes this a shaper-only feature, with a restricted focus; we believe this will address a great deal of schema evolution scenarios.

roji avatar Mar 03 '25 23:03 roji

reopening for potential patch

maumar avatar Mar 12 '25 20:03 maumar