Deprecate model managed data (HasData() seeding)
This feature doesn't appear to be used used much, but has a considerable maintenance cost.
If anyone derives value from this please add a comment below briefly describing how are you using it (e.g. whether it's used in testing or production, and how migrations actually include operations to update the data)
I do use it, to seed the data for default roles, to create default Sysop users, and other environment data when creating a new tenant in application (database per tenant setup).
On Fri, May 6, 2022, 01:42 Andriy Svyryd @.***> wrote:
This feature doesn't appear to be used used much, but has a considerable maintenance cost.
If anyone derives value from this please add a comment below briefly describing how are you using it (e.g. whether it's used in testing or production, and how migrations actually include operations to update the data)
— Reply to this email directly, view it on GitHub https://github.com/dotnet/efcore/issues/27959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM7IO5KJIURNJRQJLXYO7YLVIRMG5ANCNFSM5VGR46GA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Another problem with seeding is that we regularly see people using it in a wrong way, e.g. seeding large amounts of data, leading to a huge and unmanageable migration directory. The feature is a bit of a pit failure.
The data we are seeding takes a few forms:
A. List data seeded to lookup tables to ensure consistent usage for external reporting.
B. Source controlling data used to support back-end processes.
C. Default Values to be combined with a much larger dataset.
D. Ensuring consistent handling of menu and feature visibility, combined with much larger datasets for filtering.
An example of A.
public sealed class Activity : Enumeration
{
public static readonly List<Activity> AllItems = new();
public static Activity
Unknown = new(-1, "Unknown", false, "The Option to use when you don't know what to use. This should never be used intentionally.")
, NoAction = new(1, "No Action Needed", true)
, Pending = new(2, "Pending", false)
, Complete = new(3, "Complete", true)
, MeterCheckRequestComplete = new(4, "Meter Check Request Complete", true)
, RevProInvestigationReferral = new(5, "Rev Pro Investigation", true, "A request for a Revenue Protection Investigator to check the meter")
, MeterCheckRequestMeterShop = new(6, "Meter Check Requested", false)
, New = new(7, "New", false)
, Claimed = new(8, "Claimed", false, "", false)
, Returned = new(9, "Returned", false, "", false)
, DisconnectOrder = new(10, "Disconnect Ordered", true, "A request to disconnect the location was created", true)
, TurnOnOrder = new(11, "Account Turn On", true, "A Turn-On order has been created for the location", true)
;
private Activity(int id, string label, bool isFinal, string description = "", bool isUsable = true) : base(id, label, isUsable)
{
IsFinal = isFinal;
Description = description;
AllItems.Add(this);
}
...
public class ActivityConfiguration : IEntityTypeConfiguration<Activity>
{
public void Configure(EntityTypeBuilder<Activity> builder)
{
builder.HasKey(x => x.Id);
builder.Property(x => x.Id)
.ValueGeneratedNever();
builder.HasData(Activity.AllItems);
}
}
Anytime a new Activity is added to the Enumeration class, it is always added to the storage table so the FK can be easily converted to the friendly name for external reporting.
An example of B
public class PackageWatcher
{
public static List<PackageWatcher> AllItems = new()
{
new(1, Package.SrpPosition, Package.SrpUnit)
,new(2, Package.SrpPosition, Package.TRollupPosition)
,new(3, Package.SrpUnit, Package.TRollupPosition)
,new(4, Package.SrpPerson, Package.ActiveDirectoryAccount)
,new(5, Package.SrpPerson, Package.SrpPosition)
,new(6, Package.CutInFlat, Package.CutInFlatExtract)
,new(7,Package.CutInFlatEvent, Package.CutInFlat)
,new(8,Package.SeasonalitySuspect, Package.SeasonalitySuspectExtract)
};
}
Just like in A, builder.HasData(PackageWatcher.AllItems); is sufficient to seed the data. This source controls our dependency tree for our ETL packages. HasData dramatically simplified version controlling this information and, when combined with our CICD process, means our SSIS packages can be deployed WITH all dependencies set up, without a developer having to run an independent script. Note that the application does not manage the executions, but each package reports back to PackageExecution, another table in our project. Since the ETL environment has a presence in our application, we can easily pull in execution results to ease notification handling.
An example of C
...
builder.HasData(new List<Person>
{ { Person.None},
{ Person.Unknown },
{ Person.System }
});
...
The full Person dataset comes from our HR system, but we use HasData to seed some necessary values that are not present in the HR system.
An example of D
public class MenuLinkWorkFunction
{
public readonly static List<MenuLinkWorkFunction> AllItems = new()
{
new (MenuLink.CutInFlat, WorkFunction.PowerUser)
}
...
}
public class MenuLinkWorkFunctionConfiguration : IEntityTypeConfiguration<MenuLinkWorkFunction>
{
...
builder.HasData(GetSeedDataAndDefaults().Select(x => new { x.MenuLinkId, x.WorkFunctionId, x.IsActive }));
}
private List<MenuLinkWorkFunction> GetSeedDataAndDefaults()
{
List<MenuLinkWorkFunction> defaultItems = new();
List<MenuLink> restrictedLinks = MenuLinkWorkFunction.AllItems.Select(x => x.MenuLink).ToList();
List<MenuLink> unrestrictedLinks = MenuLink.AllItems.Except(restrictedLinks).ToList();
foreach (MenuLink menuLink in unrestrictedLinks)
{
defaultItems.Add(new MenuLinkWorkFunction(menuLink, WorkFunction.Default));
}
return defaultItems.Union(MenuLinkWorkFunction.AllItems).ToList();
}
In our system, a Menu Link (Page or Workflow) can have restricted functionality or features that are only accessible to members of a group. Work function are declared at design-time by the developer, but if a Menu Link does not require any elevated access, we want to seed a Default value without requiring an explicit declaration in the model. This method will return a collection of restricted MenuLinks combined with all other MenuLinks associated to the default function. This simplifies our security management since we can associate the security group relationship at the WorkFunction level, rather than a hybrid of secured work functions and unsecured menu links.
I think the reason HasData hasn't seen much adoption is it cannot reliably be used once an application has existing data. We've been using it for a year now, and we've reached a point in which I can't use HasData for new features, because the generated migration only works for bare iron new deployments, OR existing installs, but not both.
Please don't deprecate this, we use it to automatically create and update lookup tables for enums used in models using a variation of the code outlined here https://github.com/dotnet/efcore/issues/12248#issuecomment-395450990
Same here. Seeding enum lookup and other reference data not directly managed by the end-users. Data warehouse folks love it.
Another problem with seeding is that we regularly see people using it in a wrong way, e.g. seeding large amounts of data, leading to a huge and unmanageable migration directory. The feature is a bit of a pit failure.
Not a HasData problem. I have projects where the migrations make up a significant portion of the code without seeding more than a dozen or so records. Every minuscule db change creates a complete model snapshot. Effort should be focused on #18620 and #2174 first and revisit this if it's still a problem.
I find HasData to be ideal for small amounts of master data shared across all environments (and even automated integration tests). The beauty of it is that the seed data becomes a part of the migrations, without introducing any additional approaches one needs to know about.
For suitable use cases, I'd want nothing else.
I should add that migrations provide a very important feature: the ability to migrate down. Not only can this help with a problematic production deployment, but it also enables context switching during development. Migrated your LocalDb and need to switch branches for something urgent? Just migrate down again.
@Timovzl the question here is why the newer UseSeedingAsync API doesn't meet all these needs. It's trivial to use it to have a quick fragment of code there which checks the database and (re)seeds it if necessary. It's slightly more low-level than HasData(), but avoids a lot of the pitfalls and complexity that HasData() imposes.
@roji As a consumer, UseSeedingAsync feels like a step back to me, as mentioned above we use it to automatically create and update lookup tables for enums used in models. For non production environments, we use the Migrate command to programmatically update the database, but for production where we are scaled horizontally and have significantly more data, we generate a migration script using dotnet ef migrations script that is then reviewed by a DBA to ensure that the SQL being ran won't cause any issues* then this script is ran manually against the database potentially splitting the SQL that is ran into pre and post deployment of the codebase.
Currently using HasData we have new enum entries as INSERT statements in the script and it's obvious what is being added. As far as I can tell with our current deployment practices, and reading through UseSeedingAsync docs, this wouldn't even be called against our production environment, and even if it was called it would have to check existence for all our enum records which is unnecessary database calls (admittedly not that much overhead).
*e.g. EF tries to add indexes to existing large tables without WITH (ONLINE = ON) causing long running locks and subsequently timeouts
EDIT: Our deployment practices align with best practices laid out here https://learn.microsoft.com/en-us/ef/core/managing-schemas/migrations/applying?tabs=dotnet-core-cli
@Timovzl the question here is why the newer UseSeedingAsync API doesn't meet all these needs. It's trivial to use it to have a quick fragment of code there which checks the database and (re)seeds it if necessary. It's slightly more low-level than HasData(), but avoids a lot of the pitfalls and complexity that HasData() imposes.
Using UseSeeding and UseAsyncSeeding is the recommended way of seeding the database with initial data when working with EF Core.
I use HasData to manage the seed data ongoing across multiple deployments. One specific scenario is a lookup table of statuses. It is slowly growing as the app is developed, adding a new record every couple of months. UseSeeding mentions it is to seed initial data. I assume that means it cannot be used to handle seeding data as the application grows, or when seeding data changes.
We generate the bundle based on the git repo and apply the bundle to production through a CICD process using powershell run locally.
E:\CICD\bundle\bundle.exe --connection $Env:ConnectionString --verbose
With this process, we have performed up to eight deployments in a single day. The release cadence we have would be limited if manual review and approval of the SQL scripts was required. Migration script generation is reviewed at commit and the result is reviewed at an integration point, but the bundle is still used to perform the deployment in production.
Although each migration script is reviewed independently at commit, the bundle may be deploying multiple migrations in a single release to production. We rebase to integrate commits to ensure migrations are played in the correct order.