databricks-vscode icon indicating copy to clipboard operation
databricks-vscode copied to clipboard

[Feature] Support ignoring syncing certain paths

Open mydpy opened this issue 2 years ago • 7 comments

I'd like to use .gitignore-style syntax to specify which files should not be included in the directory sync (I am using workspace sync in my extension settings).

Context: I have a project with a large dataset of pdf files that I do not ever want to sync to a remote repository. Or if I'm working with a project that generates files I do not want them to be automatically synced to the workspace.

This triggers other errors, including delays executing workloads.

mydpy avatar Apr 06 '23 23:04 mydpy

Both local and system-wide .gitignore files should be respected when doing a directory sync.

If you have a .gitignore file but its contents is not respected when doing a sync, could you provide more details about your setup (e.g. which directories you want to ignore, (subset of) `.gitignore contents, their paths, etc).

Thanks.

pietern avatar Apr 11 '23 13:04 pietern

I see. I am developing on a large(r) monolith whose file size exceeds the current maximum. In order for me to sync this project using the vscode extension I need to add certain files that I do not want to sync to my .gitignore file. This allows the directory sync to succeed. However, this introduces another issue: I occasionally need to develop in areas of the monolith that I would like to exclude in the directory sync. So I need to remove them from the .gitignore file or add them directly.

mydpy avatar Apr 11 '23 16:04 mydpy

If I understand correctly you would need a separate mechanism from .gitignore to control which subtrees are synchronized (e.g. through inclusion/exclusion) so you stay within file size / file count limits.

What limit do you run into specifically?

pietern avatar Apr 14 '23 08:04 pietern

I have a similar issue with a monorepo. I have a repo .gitignore as well as one in my project's local folder. The local one just has a single line for the folder I am trying to exclude, i.e. data/. This is a rather large file that is currently syncing even though I would like to exclude it.

If you require any additional information, please let me know.

Thank you!

colinalexander avatar Jul 11 '23 22:07 colinalexander

I have similar issue. I can not sunc my repo with error message Client.Timeout exceeded while awaiting headers

cf-dtrznadel avatar Jul 25 '23 17:07 cf-dtrznadel

Are you looking for includes/excludes beyond what is specified in your .gitignore files? We already support ignoring syncs for any files matching patterns on any .gitignore files.

mgyucht avatar Aug 16 '23 12:08 mgyucht

I have a monorepo with multiple gitignore files. My project gitignore file just has one item data/. This folder was being synchronized despite being gitignored.

colinalexander avatar Aug 16 '23 13:08 colinalexander

In the v2 of the extension we now use Databricks Asset Bundles, and they provide a way to ignore files

ilia-db avatar Mar 04 '25 09:03 ilia-db