dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

Getting "RuntimeError: reentrant call inside <_io.BufferedWriter name='logs/dbt.log'>" when running dbt test concurrently

Open daaain opened this issue 3 years ago • 9 comments

Describe the bug

I'm not sure if it's the upgrade to 0.19.1 from 0.17.0 or setting concurrency to 30 (probably the latter), but started getting Python errors from dbt when running tests. Luckily the tests themselves continue running correctly, but I'm guessing the log file would be messed up. Doesn't happen on every run (approximately every 20th or so), likely a race condition.

Steps To Reproduce

Run dbt --log-format json test with 30 threads.

Expected behavior

Concurrency handled correctly.

Screenshots and log output

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/logbook/handlers.py", line 216, in handle
    self.emit(record)
  File "/usr/local/lib/python3.7/site-packages/dbt/logger.py", line 458, in emit
    super().emit(record)
  File "/usr/local/lib/python3.7/site-packages/logbook/handlers.py", line 840, in emit
    if self.should_rollover(record, len(msg)):
  File "/usr/local/lib/python3.7/site-packages/logbook/handlers.py", line 818, in should_rollover
    self.stream.seek(0, 2)
RuntimeError: reentrant call inside <_io.BufferedWriter name='logs/dbt.log'>

System information

Which database are you using dbt with?

  • [ ] postgres
  • [ ] redshift
  • [x] bigquery
  • [ ] snowflake
  • [ ] other (specify: ____________)

The output of dbt --version:

installed version: 0.19.1
   latest version: 0.19.1

Up to date!

Plugins:
  - bigquery: 0.19.1
  - snowflake: 0.19.1
  - postgres: 0.19.1
  - redshift: 0.19.1

The operating system you're using:

Debian GNU/Linux 10 (buster)

The output of python --version:

Python 3.7.10

daaain avatar May 27 '21 11:05 daaain

Hey @daaain, I have to imagine this is related to running with a large number of threads, all of which are trying to write to logs/dbt.log. It looks like the actual error is being tripped by the logbook package here, within its logic to detect log file size and rotate to a new log file if needed.

It also sounds like a known issue with python, signal handlers, and IO: https://bugs.python.org/issue24283. Apparently, it was broken in python2, too, but the error message is new in python3.

My instinct is that this isn't something we can readily fix with a code change within dbt. This is definitely not my area of expertise, though, so I'm willing to leave the issue open and see if anyone else has a solid proposal. For what it's worth, it isn't a bug I was able to reproduce myself, either: I ran dbt --log-format json --threads 30 in a project with 450+ tests, and never saw this error.

jtcohen6 avatar May 29 '21 18:05 jtcohen6

Thanks a lot for the reply!

Right, that makes sense, so it's quite a fundamental Python issue and the crash itself is in a dependency so I agree, doesn't sound like something easy to fix in dbt as is.

I had one thought though, so the issue is very likely due to all the threads writing in the same log file, which is not just an IO issue, but I think also a usability one as the log messages from different threads are all interspersed. So maybe something to consider could be an option to write a log file for each thread, which could solve both issues? And if a single log file is important for compatibility, all the individual ones could be merged after all threads finished.

daaain avatar Jun 23 '21 10:06 daaain

That's a possibility. The logbook method that trips this error definitely seems related to writing the log file. The thing that gives me pause is, the reproduction case linked in that python bug doesn't seem to involve file IO at all—it's just printing to stdout. From my (very limited) knowledge of this stuff, I'm not sure if writing to more files would solve this for us?

One more thing that gives me pause: https://github.com/fishtown-analytics/dbt/blob/c794600242e734d870d4bf1c8fb4c0349ab961eb/core/dbt/logger.py#L452-L453

As it is, we may not be sticking with logbook forever, so safer thread handling may be on wish list when we look into other potential logging packages (https://github.com/fishtown-analytics/dbt/issues/3359).

jtcohen6 avatar Jun 24 '21 01:06 jtcohen6

OK, if there's some upcoming work on rewriting logging anyway then this doesn't really worth bothering with, unless someone comes along and knows exactly what one-liner change can fix it 😹

daaain avatar Jun 24 '21 14:06 daaain

I've got 8 threads and just observed this with dbt compile in a CICD pipeline for the first time ever out of hundreds of runs. Seems like a once in a long while fluke error.

muscovitebob avatar Nov 30 '21 14:11 muscovitebob

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions[bot] avatar Jun 11 '22 02:06 github-actions[bot]

This seems eerily similar to https://github.com/getlogbook/logbook/issues/136 I'd like this open

visch avatar Jun 17 '22 12:06 visch

I've got 8 threads and just observed this with dbt compile in a CICD pipeline for the first time ever out of hundreds of runs. Seems like a once in a long while fluke error.

I also hit mine with a large number of threads (seemingly randomly as things were chugging along fine for a long time for me). In my case I think the logs directory is hosted on the network, and I'm using Windows.

My guess is the file system isn't quite as performant anymore, but that's wild guess. Some kind of race condition

visch avatar Jun 17 '22 13:06 visch

This issues still exist: RuntimeError: reentrant call inside <_io.BufferedWriter name='/tmp/auto-sklearn_tmp_55ed1e22-1398-11ed-ad47-00163e9cfb76/AutoML(1):55ed17be-1398-11ed-ad47-00163e9cfb76.log'>

VillageDeveloper2019 avatar Aug 04 '22 02:08 VillageDeveloper2019

RuntimeError: reentrant call inside <_io.BufferedWriter name=''>

idem

waflessnet avatar Jan 15 '23 00:01 waflessnet

@waflessnet Which version of dbt-core are you using? Since v1.0, we no longer use logbook for the default logging system. (For compatibility with some preexisting systems, it's still possible to use logbook by explicitly enabling "legacy logging": DBT_ENABLE_LEGACY_LOGGER=1.)

jtcohen6 avatar Jan 16 '23 13:01 jtcohen6

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Jan 12 '24 01:01 github-actions[bot]

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

github-actions[bot] avatar Jan 20 '24 01:01 github-actions[bot]