ecto_sqlite3 icon indicating copy to clipboard operation
ecto_sqlite3 copied to clipboard

Getting DBConnection.ConnectionError on sandbox migrations

Open D4no0 opened this issue 2 years ago • 12 comments

I am trying to use a sqlite3 database with ecto. Everything seems to work without any problems on the dev environment, however when I am running tests, a lot of times I get this error:

[error] Exqlite.Connection (#PID<0.214.0>) failed to connect: ** (Exqlite.Error) database is locked [info] == Running 20230905120403 SslMoon.Repo.Migrations.AddChecksTable.change/0 forward [info] create table checks ** (DBConnection.ConnectionError) could not checkout the connection owned by #PID<0.223.0>. When using the sandbox, connections are shared, so this may imply another process is using a connection. Reason: connection not available and request was dropped from queue after 1760ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval See DBConnection.start_link/2 for more information (ecto_sql 3.10.1) lib/ecto/adapters/sql.ex:913: Ecto.Adapters.SQL.raise_sql_call_error/1 (elixir 1.15.5) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2 (ecto_sql 3.10.1) lib/ecto/adapters/sql.ex:1005: Ecto.Adapters.SQL.execute_ddl/4 (ecto_sql 3.10.1) lib/ecto/migration/runner.ex:327: Ecto.Migration.Runner.log_and_execute_ddl/3 (elixir 1.15.5) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2 (stdlib 5.0.2) timer.erl:270: :timer.tc/2 (ecto_sql 3.10.1) lib/ecto/migration/runner.ex:25: Ecto.Migration.Runner.run/8 (ecto_sql 3.10.1) lib/ecto/migrator.ex:362: Ecto.Migrator.attempt/8

Any idea on why this might be happening? I've read that sqlite doesn't allow concurrent inserts, however creation of db and tables shouldn't be concurrent in theory.

For reference my test_helper.exs contains also: Ecto.Adapters.SQL.Sandbox.mode(Repo, :manual)

D4no0 avatar Sep 19 '23 07:09 D4no0

This also seems to not happen every time, maybe some kind of race condition is involved here?

D4no0 avatar Sep 19 '23 07:09 D4no0

What are your settings for your database?

Specifically :pool, :queue_target and :pool_size for your test environment.

warmwaffles avatar Sep 19 '23 12:09 warmwaffles

These are the settings, the other ones are set to default:

pool: Ecto.Adapters.SQL.Sandbox,
pool_size: 10

I've tried to set pool_size to 1, but it seems the problem remains. Also I am running this in a docker container if that helps.

D4no0 avatar Sep 19 '23 13:09 D4no0

How many of those tests are using async: true?

warmwaffles avatar Sep 19 '23 13:09 warmwaffles

None, currently all tests that include DB interaction are disabled.

The failure always occurs at migration stage.

D4no0 avatar Sep 19 '23 14:09 D4no0

In our integration tests in the test_helper.exs we do this

https://github.com/elixir-sqlite/ecto_sqlite3/blob/37b68d9874695e5c17b80c5d7d57f8a803fd9524/integration_test/test_helper.exs#L50-L71

You can ignore PoolRepo its there to test that pooled connections work.

Generally when I am working with sqlite in my personal projects I have an alias I use

def project do
  [
    # ... snip ...
    preferred_cli_env: [
      "test.prepare": :test
    ]
  ]
end

def aliases do
  [
    # ... snip ...
    "test.prepare": ["ecto.drop --quiet", "ecto.create --quiet", "ecto.migrate"]
  ]
end

The above is not necessary, but it's a nice to have.

So the workflow is usually

mix ecto.gen.migration foobar
mix ecto.migrate
# do work
# add tests
mix test.prepare
mix test

warmwaffles avatar Sep 19 '23 14:09 warmwaffles

Ok, this might be promising. I will try it and come back with feedback!

Is this specified somewhere in the official documentation? because I was digging for quite some time and didn't found anything.

D4no0 avatar Sep 19 '23 14:09 D4no0

No it is not. We can definitely add it in as a "Suggested Workflow" pattern.

warmwaffles avatar Sep 19 '23 14:09 warmwaffles

Hi @warmwaffles,

I am having the very same error over and over again, especially in CI.

17:38:52.553 [error] Exqlite.Connection (#PID<0.224.0>) failed to connect: ** (Exqlite.Error) database is locked

You can see my entire test setup on SwissSchema's GitHub, but below is the most relevant piece of code:

defmodule SwissSchemaTest do

  # ...
  @database_dir Application.compile_env!(:swiss_schema, :database_dir) # => "/tmp/swiss_schema"

  setup_all do
    File.rm_rf!(@database_dir)

    Enum.each([Repo, Repo2], fn repo ->
      db_name = "#{repo}" |> String.split(".") |> List.last() |> String.downcase()
      db_path = "#{@database_dir}/#{db_name}.db"

      start_link = Function.capture(repo, :start_link, 1)
      start_link.(database: db_path, log: false)

      Ecto.Adapters.SQLite3.storage_up(database: db_path)
      Ecto.Migrator.up(repo, 1, SwissSchemaTest.CreateUsers, log: false)
    end)

    on_exit(fn -> File.rm_rf!(@database_dir) end)
  end

  setup do: on_exit(fn -> Repo.delete_all(User) && Repo2.delete_all(User) end)

It smells like some racing condition, but I'm unable to understand where I'm creating such racing cond. Any help is appreciated :)

joeljuca avatar Dec 24 '23 20:12 joeljuca

Does this happen all the time or is it infrequent? Does it coincide with test that fails (not the cause for the test failure)?

I'm hypothesizing that this may be an unclean shutdown of the failing database.

I haven't looked at the link yet as I am on my phone, but I'll get to it soon.

warmwaffles avatar Dec 26 '23 03:12 warmwaffles

Matthew,

It happens quite frequently. I frequently have to rerun failed tests over and over again to get rid of failing pipelines because of “database is locked” errors.

So, I don’t think it’s a failing test case. As you can see, I am using setup_all/1’s on_exit/1 hook to call rm_rf!/1 on the database dir. It would be executed even if there’s a failing test, right?

Do you think that manually shutting down database processes (started w/ repo’s start_link/1) could help with it?

IIUC, all my tests run synchronously. But I’ll have a closer look tomorrow to make sure that’s the case.

Joel Jucá joeljuca.com

On Tue, 26 Dec 2023 at 12:16 AM Matthew Johnston @.***> wrote:

Does this happen all the time or is it infrequent? Does it coincide with test that fails (not the cause for the test failure)?

I'm hypothesizing that this may be an unclean shutdown of the failing database.

How many test cases are in this test file? Is this ran asynchronously?

— Reply to this email directly, view it on GitHub https://github.com/elixir-sqlite/ecto_sqlite3/issues/125#issuecomment-1869221915, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFEQXEE4YNQFH7Q5ZKMCSLYLI6RTAVCNFSM6AAAAAA453NC22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZGIZDCOJRGU . You are receiving this because you commented.Message ID: @.***>

joeljuca avatar Dec 26 '23 03:12 joeljuca

I just checked it out. The problem seems to only happen during the setup_all block at the very beginning of the test suite, then continues on fine.

I suspect you are not deleting the WAL file and some other various lock files that go along with the sqlite database. That being said, the errors stopped completely when I remove the cleanup procedures.

  setup_all do
    Enum.each([Repo, Repo2], fn repo ->
      db_name = "#{repo}" |> String.split(".") |> List.last() |> String.downcase()
      db_path = "#{@database_dir}/#{db_name}.db"

      start_link = Function.capture(repo, :start_link, 1)
      start_link.(database: db_path, log: false)

      Ecto.Adapters.SQLite3.storage_up(database: db_path)
      Ecto.Migrator.up(repo, 1, SwissSchemaTest.CreateUsers, log: false)
    end)

    :ok
  end

Edit: Also you can just call repo.start_link(database: db_path, log: false) instead of using Function.capture/3 😉

Edit2: Very peculiar indeed. You delete the entire directory which should also delete the other files. I do wonder if the File.rm_rf has not completed before recreating the directory and database.

Edit3: I do suspect that File.rm_rf hasn't actually completed before the repo is created. Injecting a Process.sleep(1_000) doesn't seem to have the logger error occur.

    File.rm_rf!(@database_dir)

    Process.sleep(1000)

    Enum.each([Repo, Repo2], fn repo ->

Edit4 (final): One other solution would be to configure a random database before each run. Trying to utilize something like "/tmp/swiss_schema/#{Ecto.UUID.generate()}/repo.db" would also help here.

The last on_exit in the setup_all never actually completes either and I still see a residual database sitting in my /tmp dir. I suspect the OS/Erlang calls to delete the file don't complete before the process exits and is aborted.

on_exit(fn -> File.rm_rf!(@database_dir) end)

Adding a Process.sleep/1 in there has not left any temp files laying around.

I hope this has been helpful for you.

warmwaffles avatar Dec 26 '23 03:12 warmwaffles