dbt-core
dbt-core copied to clipboard
Getting "RuntimeError: reentrant call inside <_io.BufferedWriter name='logs/dbt.log'>" when running dbt test concurrently
Describe the bug
I'm not sure if it's the upgrade to 0.19.1
from 0.17.0
or setting concurrency to 30 (probably the latter), but started getting Python errors from dbt when running tests. Luckily the tests themselves continue running correctly, but I'm guessing the log file would be messed up. Doesn't happen on every run (approximately every 20th or so), likely a race condition.
Steps To Reproduce
Run dbt --log-format json test
with 30 threads.
Expected behavior
Concurrency handled correctly.
Screenshots and log output
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/logbook/handlers.py", line 216, in handle
self.emit(record)
File "/usr/local/lib/python3.7/site-packages/dbt/logger.py", line 458, in emit
super().emit(record)
File "/usr/local/lib/python3.7/site-packages/logbook/handlers.py", line 840, in emit
if self.should_rollover(record, len(msg)):
File "/usr/local/lib/python3.7/site-packages/logbook/handlers.py", line 818, in should_rollover
self.stream.seek(0, 2)
RuntimeError: reentrant call inside <_io.BufferedWriter name='logs/dbt.log'>
System information
Which database are you using dbt with?
- [ ] postgres
- [ ] redshift
- [x] bigquery
- [ ] snowflake
- [ ] other (specify: ____________)
The output of dbt --version
:
installed version: 0.19.1
latest version: 0.19.1
Up to date!
Plugins:
- bigquery: 0.19.1
- snowflake: 0.19.1
- postgres: 0.19.1
- redshift: 0.19.1
The operating system you're using:
Debian GNU/Linux 10 (buster)
The output of python --version
:
Python 3.7.10
Hey @daaain, I have to imagine this is related to running with a large number of threads, all of which are trying to write to logs/dbt.log
. It looks like the actual error is being tripped by the logbook
package here, within its logic to detect log file size and rotate to a new log file if needed.
It also sounds like a known issue with python, signal handlers, and IO: https://bugs.python.org/issue24283. Apparently, it was broken in python2, too, but the error message is new in python3.
My instinct is that this isn't something we can readily fix with a code change within dbt. This is definitely not my area of expertise, though, so I'm willing to leave the issue open and see if anyone else has a solid proposal. For what it's worth, it isn't a bug I was able to reproduce myself, either: I ran dbt --log-format json --threads 30
in a project with 450+ tests, and never saw this error.
Thanks a lot for the reply!
Right, that makes sense, so it's quite a fundamental Python issue and the crash itself is in a dependency so I agree, doesn't sound like something easy to fix in dbt as is.
I had one thought though, so the issue is very likely due to all the threads writing in the same log file, which is not just an IO issue, but I think also a usability one as the log messages from different threads are all interspersed. So maybe something to consider could be an option to write a log file for each thread, which could solve both issues? And if a single log file is important for compatibility, all the individual ones could be merged after all threads finished.
That's a possibility. The logbook method that trips this error definitely seems related to writing the log file. The thing that gives me pause is, the reproduction case linked in that python bug doesn't seem to involve file IO at all—it's just printing to stdout. From my (very limited) knowledge of this stuff, I'm not sure if writing to more files would solve this for us?
One more thing that gives me pause: https://github.com/fishtown-analytics/dbt/blob/c794600242e734d870d4bf1c8fb4c0349ab961eb/core/dbt/logger.py#L452-L453
As it is, we may not be sticking with logbook
forever, so safer thread handling may be on wish list when we look into other potential logging packages (https://github.com/fishtown-analytics/dbt/issues/3359).
OK, if there's some upcoming work on rewriting logging anyway then this doesn't really worth bothering with, unless someone comes along and knows exactly what one-liner change can fix it 😹
I've got 8 threads and just observed this with dbt compile
in a CICD pipeline for the first time ever out of hundreds of runs. Seems like a once in a long while fluke error.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.
This seems eerily similar to https://github.com/getlogbook/logbook/issues/136 I'd like this open
I've got 8 threads and just observed this with
dbt compile
in a CICD pipeline for the first time ever out of hundreds of runs. Seems like a once in a long while fluke error.
I also hit mine with a large number of threads (seemingly randomly as things were chugging along fine for a long time for me). In my case I think the logs directory is hosted on the network, and I'm using Windows.
My guess is the file system isn't quite as performant anymore, but that's wild guess. Some kind of race condition
This issues still exist: RuntimeError: reentrant call inside <_io.BufferedWriter name='/tmp/auto-sklearn_tmp_55ed1e22-1398-11ed-ad47-00163e9cfb76/AutoML(1):55ed17be-1398-11ed-ad47-00163e9cfb76.log'>
RuntimeError: reentrant call inside <_io.BufferedWriter name='
idem
@waflessnet Which version of dbt-core
are you using? Since v1.0, we no longer use logbook
for the default logging system. (For compatibility with some preexisting systems, it's still possible to use logbook
by explicitly enabling "legacy logging": DBT_ENABLE_LEGACY_LOGGER=1
.)
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.