dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Add an equivalent to .gitignore.

Open pbowyer opened this issue 3 years ago • 1 comments

Use case

There are some files (tables) that I never want to commit - files that change regularly, where the content doesn't need versioning, and where performance is important.

It would be great to have the equivalent of a .gitignore file so I can ensure I never accidentally version them.

pbowyer avatar Jul 29 '22 15:07 pbowyer

Cool feature request. I could see this being useful. Thanks for opening this one!

fulghum avatar Aug 01 '22 20:08 fulghum

I think in Dolt, this is probably best implemented as a writable system table.

timsehn avatar Apr 14 '23 17:04 timsehn

This is a great feature request and once we've wanted for a while. In addition to increasing out compatibility with Git, this feature has some natural use cases in a SQL server context (depending on the implementation). Many application schemas include tables that are incompatible with versioning. For example, having multiple versions of permissions or authorization data could lead to security issues. However, there are some unresolved questions around the semantics of non-versioned tables.

The central design problem is defining the scope of non-versioned tables and how they interact with versioned tables. If you create a non-versioned table t on branch main, but then switch to myBranch which has another table t, we are left with a name collision to deal with. If you define a foreign key between versioned and non-versioned tables, maintaining referential integrity is somewhat intractable. Disallowing this type of foreign key is an appealing option, but this puts a burden on application developers to partition their schema into versioned and non-versioned parts, which may not be possible in their data model. Additional Dolt-specific design question exist such as whether these tables are stored in the commit graph and whether they're replicated

andy-wm-arthur avatar Apr 14 '23 22:04 andy-wm-arthur

If you create a non-versioned table t on branch main, but then switch to myBranch which has another table t, we are left with a name collision to deal with.

I just tried this with git to see how git handles this, and their solution is to have the versioned file clobber the non-versioned file, with no warning, no message, nothing. Switching back to the branch with the non-versioned file reveals that the non-versioned version of the file is simply gone, unrecoverable because it was never tracked by git.

That's pretty awful. Let's not do that.

nicktobey avatar Apr 14 '23 23:04 nicktobey

Additional Dolt-specific design question exist such as whether these tables are stored in the commit graph and whether they're replicated.

So there's two different interpretations here: replicated across branches (all of a client's branches see the same table), and replicated across clients (the table isn't versioned but is still delivered to clients that request it).

Assuming right now that we're talking about replicating across clients:

I'm trying to imagine what the user experience for replicated, non-versioned data is, and it feels like it's a massive footgun and there's probably a reason why git doesn't do it.

Imagine you pull an update, and it has a change to a non-versioned table that creates a merge conflict. You resolve the merge conflict by accepting their change without thinking too much about it... and now your copy of the data is just gone. Oops. And you can't get it back because it wasn't tracked. And what if there's not a merge conflict and it just silently alters your table in ways that you didn't anticipate and can't easily undo?

One of the reasons why git pulls can make sweeping changes to your working set is because it's all fixable, nothing's lost for good. If part of the replicated data isn't versioned, all bets are off. And in the vast majority of cases, there's little reason to not want it to be versioned, so git just versions everything.

I think there's compelling reasons to want non-versioned data: maybe the table represents some control structure that all branches should be using the most recent version of; or redacting old data for privacy reasons. But in both those cases you probably don't want the table to be replicated either, because replicated data can always be preserved and modified by the client that pulls it. If you have a server with tables like this, you want to keep them on the server and make users connect with a SQL client to run queries on them.

Assuming now that we're talking about "replicating across branches on the same client":

I think this is more reasonable. There are some set of tables in the database that aren't tracked by dolt. They don't even have to use dolt's storage layer if it's easier to just use some other storage layer for them. Is that something that go-mysql-server can do, mix-and-match storage modes for a single database?

Disallowing this type of foreign key is an appealing option, but this puts a burden on application developers to partition their schema into versioned and non-versioned parts.

Either this, or we add a lot of extra validation to checking out branches. It's possible that we may allow foreign keys from non-versioned tables to versioned tables but not vice versa (or the other way around). I'd need to think about this a lot more. There's probably a million subtle corner cases. Not allowing foreign keys between the two "partitions" of the database may be the safer option.

nicktobey avatar Apr 15 '23 00:04 nicktobey

Fixed by https://github.com/dolthub/dolt/pull/5809

nicktobey avatar Apr 28 '23 17:04 nicktobey