continue icon indicating copy to clipboard operation
continue copied to clipboard

Continue doesn't respect `.gitignore`

Open romiras opened this issue 9 months ago • 25 comments

Before submitting your bug report

Relevant environment info

- OS: Ubuntu Linux 20.04
- Continue: v0.8.25
- IDE: VS Code 1.88.1

Description

It seems that Continue doesn't respect .gitignore at all. Every ignored file is leaked into indexing.

To reproduce

Generate RoR application

.gitignore:

# Ignore bundler config.
/.bundle

# Ignore all logfiles and tempfiles.
/log/*
/tmp/*
!/log/.keep
!/tmp/.keep

# Ignore pidfiles, but keep the directory.
/tmp/pids/*
!/tmp/pids/
!/tmp/pids/.keep

# Ignore uploaded files in development.
/storage/*
!/storage/.keep
.byebug_history

# Ignore master key for decrypting credentials and more.
/config/master.key

.env*

After long process of indexing du -hs ~/.continue/index reports about 144M. After running echo 'ignore me' > tmp/ignore.txt a file indexed also. Command sqlite3 -column ~/.continue/index/index.sqlite "select * from tag_catalog where path like '%ignore.txt'" shows:

22          /home/user/Projects/r7_app  main        chunks      /home/user/Projects/r7_app/config/master.key  20cc3b0a2108ccc75874833530a741f5b82c89c27a8876cda8933a5981cd8241  1714219436499
1494        /home/user/Projects/r7_app  main        vectordb::  /home/user/Projects/r7_app/config/master.key  20cc3b0a2108ccc75874833530a741f5b82c89c27a8876cda8933a5981cd8241  1714219449950
2900        /home/user/Projects/r7_app  main        sqliteFts   /home/user/Projects/r7_app/config/master.key  20cc3b0a2108ccc75874833530a741f5b82c89c27a8876cda8933a5981cd8241  1714219963517
4371        /home/user/Projects/r7_app  main        codeSnippe  /home/user/Projects/r7_app/config/master.key  20cc3b0a2108ccc75874833530a741f5b82c89c27a8876cda8933a5981cd8241  1714220255652
5820        /home/user/Projects/r7_app  main        chunks      /home/user/Projects/r7_app/tmp/ignore.txt     1475d3ed5223ede0fcb823689c5b83ca154066b72d5a837a1e1b3109ef1d1b6f  1714221543385
5821        /home/user/Projects/r7_app  main        vectordb::  /home/user/Projects/r7_app/tmp/ignore.txt     1475d3ed5223ede0fcb823689c5b83ca154066b72d5a837a1e1b3109ef1d1b6f  1714221543401
5822        /home/user/Projects/r7_app  main        sqliteFts   /home/user/Projects/r7_app/tmp/ignore.txt     1475d3ed5223ede0fcb823689c5b83ca154066b72d5a837a1e1b3109ef1d1b6f  1714221545311
5823        /home/user/Projects/r7_app  main        codeSnippe  /home/user/Projects/r7_app/tmp/ignore.txt     1475d3ed5223ede0fcb823689c5b83ca154066b72d5a837a1e1b3109ef1d1b6f  1714221545550

Log output

No response

romiras avatar Apr 27 '24 13:04 romiras

@romiras is this .gitignore in the root of your opened VS Code workspace? I just want to make sure I have all of the structural details of your folders correct so that I can test this myself. My first guess at what's happening is that we aren't correctly handling the leading '/' in .gitignore

sestinj avatar May 02 '24 21:05 sestinj

@sestinj Ruby on Rails (RoR) application has been generated by rails new r7_app --api -J -T -A --skip-hotwire just to let me and others reproduce the issue. RoR was taken just for sake of example. You can generate skeleton app for Django app or whatever else.

Output of command "tree" in root of project
.
├── app
│   ├── channels
│   │   └── application_cable
│   │       ├── channel.rb
│   │       └── connection.rb
│   ├── controllers
│   │   ├── application_controller.rb
│   │   └── concerns
│   ├── jobs
│   │   └── application_job.rb
│   ├── mailers
│   │   └── application_mailer.rb
│   ├── models
│   │   ├── application_record.rb
│   │   └── concerns
│   └── views
│       └── layouts
│           ├── mailer.html.erb
│           └── mailer.text.erb
├── bin
│   ├── bundle
│   ├── rails
│   ├── rake
│   └── setup
├── config
│   ├── application.rb
│   ├── boot.rb
│   ├── cable.yml
│   ├── credentials.yml.enc
│   ├── database.yml
│   ├── environment.rb
│   ├── environments
│   │   ├── development.rb
│   │   ├── production.rb
│   │   └── test.rb
│   ├── initializers
│   │   ├── cors.rb
│   │   ├── filter_parameter_logging.rb
│   │   └── inflections.rb
│   ├── locales
│   │   └── en.yml
│   ├── master.key
│   ├── puma.rb
│   ├── routes.rb
│   └── storage.yml
├── config.ru
├── db
│   ├── development.sqlite3
│   ├── schema.rb
│   ├── seeds.rb
│   └── test.sqlite3
├── Gemfile
├── Gemfile.lock
├── lib
│   └── tasks
├── log
│   └── development.log
├── public
│   └── robots.txt
├── Rakefile
├── README.md
├── storage
├── tmp
│   ├── cache
│   │   └── bootsnap
│   │       ├── compile-cache-iseq
│   │       │   ├── 00
│   │       │   │   ├── 0324cd82370db3
│   │       │   │   └── 2bea6d7542e7b1
│   │       │   ├── 01
│   │       │   │   ├── 87a7d114483147
│   │       │   │   ├── baa583cbd8a735
│   │       │   │   └── d3f77b581741b6
... (with many other temporary directories and files)
│   │       │   ├── fe
│   │       │   │   ├── 2381ed85da335a
│   │       │   │   ├── 5a99b7a28da7d1
│   │       │   │   └── 6acadc23be5f36
│   │       │   └── ff
│   │       │       ├── 32b852982880e8
│   │       │       ├── d8ce61cef93021
│   │       │       ├── dd9911203bbb8a
│   │       │       └── e84f0efbda8321
│   │       └── load-path-cache
│   ├── development_secret.txt
│   ├── ignore.txt
│   ├── pids
│   └── storage
└── vendor

285 directories, 1467 files

File .gitignore also located in root of project.

As result of indexing we can see all tmp, secret file, ... everything that we don't expect... leaked into ~/.continue/index/index.sqlite.

romiras avatar May 03 '24 08:05 romiras

I think continue.dev is slowing down my machine when I open VSCode due to it indexing my 20,000 line daily_journal.md despite a .gitignore with *.md existing in the same directory.

slyt avatar May 28 '24 21:05 slyt

Can confirm. Using WSL or dev container, Continue will try to index my build folder, even though it is in the .gitignore. This is a particular issue, as the c++ package manager we are using copies the source of all the dependencies into the build folder, so Continue attempts to index all of Boost.

dchansen avatar May 31 '24 08:05 dchansen

In my case Continue doesn't respect both .gitignore and .continueignore at all, I am using JetBrains IDE in Windows.

savely-krasovsky avatar Jun 19 '24 22:06 savely-krasovsky

Similar issue here, actually worse than what other people have reported. My project is multi-root workspace, with .gitignore in each root (multiple folders entries in the .code-workspace file)

This code-workspace file is opened in a devcontainer.

What happens is that the VSCode Continue.dev extension will start indexing, which takes a very long time. A few seconds later, VSCode will freeze and get stuck on "Reconnecting to devcontainer..." making it impossible to do any work.

A few seconds later the Continue.dev indexing will get stuck, so I can't just wait for it to complete.

I tried adding a .continueignore file at the file system root of the project, but didn't help.

Disabling the Continue extension will stop the problem.

fazo96 avatar Jun 24 '24 09:06 fazo96

I was able to mitigate the problem somewhat by adding a separate .continueignore to each folder of my VSCode workspace even though 2 of them are subfolders of the root.

However, this still doesn't let me use the extension because it is never able to finish indexing (see #1467) and eventually leads to VSCode or the extension crashing/freezing.

fazo96 avatar Jun 27 '24 07:06 fazo96

Thanks everyone for adding details here. I'm going to do work on this problem this week, as it definitely seems pressing. Until then you can set "disableIndexing": true in your config.json to avoid any critical errors

sestinj avatar Jun 30 '24 22:06 sestinj

The same issue for me as well. My workspace has a few conda environments, but they are indeed added to .gitignore. One thing I've noticed was the extension was busy looping the editor, so everything was very sluggish.

Adding the aforementioned environments to .continueignore does fix the issue.

My setup: VS Code + Dev Container on Windows PC

paaloeye avatar Jul 08 '24 18:07 paaloeye

@pbrit are you on the main release of the extension? I've just recently published a new pre-release version (0.9.177) that should correctly listen to all .gitignore patterns.

The only thing I can potentially think of if you're already on the pre-release: is there any chance you've opened a sub-folder in the repository at the root of your VS Code workspace, where the .gitignore is in the root of the repository, not in the VS Code workspace?

sestinj avatar Jul 08 '24 18:07 sestinj

How will continue deal with a mono repo with multiple projects under it, and each having it's own .gitignore.

For example:

/monorepo-root
   /clientmobile
   /clientweb
   /backend
   /microservices
      /graphicprocessing
      /datalakeagent

IMHO, I lean towards a gitignore file to be applied, regardless of where it lies within project structure. Maybe they should be identified during the indexing and applied to everything under it.

inzi avatar Jul 26 '24 00:07 inzi

Same issue here. It would be great if the plugin treated .gitignore files the same way as git.

wiktorsikora avatar Jul 28 '24 08:07 wiktorsikora

Same issue here. It would be great if the plugin treated .gitignore files the same way as git.

Which issue do you have exactly? The title of this issue is a bit misleading (also it is a bit of a dated issue).

Just to clarify the current status:

Continue (as of VSCode 0.8.43 / JetBrains 0.0.53) respects a top level .gitignore and uses the same syntax as Git via the ignore npm library, for details see -> https://github.com/continuedev/continue/blob/main/core/indexing/walkDir.ts

Also a top level .continueignore file is supported, to ignore even more files, which for some reason cannot be in .gitignore, see -> https://docs.continue.dev/features/codebase-embeddings#ignore-files-during-indexing

See also -> https://github.com/continuedev/continue/blob/main/core/indexing/ignore.ts for the list of files and folders that get excluded by default for indexing (e.g. binary files).

fry69 avatar Jul 28 '24 09:07 fry69

Sorry, seems that .gitignore is working as expected. I noticed that the plugin indexes Rust .rlib files. But I am working on pretty big monorepo where we utilise symlinks. After investigating tag_catalog from index.sqlite it turned out that one of target dirs is symlinked with different name. This resulted in the entire directory being indexed.

Nevertheless i would suggest to add .bin and .rlib files to DEFAULT_IGNORE_FILETYPES since these are binary files.

Sorry for bothering and thanks for such a great work ! :D

wiktorsikora avatar Jul 28 '24 09:07 wiktorsikora

Hey @sestinj

  1. can you update regarding the issue? Was it fixed in latest version or not yet? If fixed, could you please refer what PR is fixing it?
  2. does .continueignore has same semantics as .gitignore? If so, will creating a symlink to .gitignore work?

romiras avatar Jul 28 '24 14:07 romiras

Just chiming in from JetBrains Rider using Continue 0.0.56.

My root level .gitignore has the following:

# Build results
...
[Bb]in/
[Oo]bj/
...

Yet the sqlite index contains e.g. D:/Path/to/project\src\WebApi\obj\project.assets.json

However, no /bin/ files are indexed.

spaasis avatar Aug 02 '24 05:08 spaasis

Possible fix for this issue see -> https://github.com/continuedev/continue/pull/1880

fry69 avatar Aug 02 '24 07:08 fry69

Fix should be in VSCode Pre-Release 0.9.193, please test.

fry69 avatar Aug 03 '24 13:08 fry69

We should not be indexing symlinks either (we just ignore them)

spew avatar Aug 08 '24 15:08 spew

I see an issue now in building relative paths. I am editing in VSCode an opensource project that has a folder named xyz/ Inside it there is a .gitignore file with a line "xyz" The logic is that under this folder, an application is built with the same name as the folder, and the binary should not be tracked. i.e. xyz/xyz is built. git knows to track all files in xyz/ except xyz/xyz However, Continue ignores indexing all the files in the folder xyz/ I need to edit the .gitignore, remove the xyz line and then refresh the window to get Continue to reindex these files.

ameydav avatar Aug 14 '24 14:08 ameydav

I just tested this out by editing a .gitignore file and adding the line xyz and then creating a folder, xyz/ with a file a.txt. git seems to ignore the entire directory xyz contrary to your example @ameydav

spew avatar Aug 14 '24 18:08 spew

Sorry, perhaps I wasn't clear in my description. The top-level .gitignore file does not contain xyz. There is a secondary .gitignore file inside the xyz/ folder.

e.g. folder structure is:

.gitignore
some-files
xyz/
   |-->.gitignore
         some-other-files
         xyz

If the top-level .gitignore contains xyz, then git will ignore the xyz/ folder. If the internal .gitignore xontains xzy, then git will ignore xyz/xyz.

From my experience yesterday, continue ignrored the xyz/ folder in both cases.

ameydav avatar Aug 15 '24 06:08 ameydav

Hi @ameydav thanks for giving me an exact scenario! I have created this PR for this and it is here: https://github.com/continuedev/continue/pull/2017/files

spew avatar Aug 15 '24 18:08 spew