Where to reference/install dbt packages with MWAA?
Where should I be installing packages (or referencing if deployed during a CI step) dbt packages while using MWAA? I'm currently installing to: /tmp/dbt/packages but am experiencing about a 50/50 success rate when running the test operator. Here is what the graph view of my dags look like:
This is the error log for when the runs fail:

The deps operator works great, then dbt run, but the test operator fails about 50% of the time. Should I be deploying package files to a folder within the MWAA S3 folder and then updating profiles.yml to reference that location?
Environment context: MWAA v2.0.2 dbt-core>=1.0 dbt-postgres>=1.0 airflow-dbt==0.4.0
I was able to have custom modules by having the modules in the folder dbt_modules in my dbt project. Then the whole dbt project is in the dags folder on S3.
And finally, when using the dbt operator, I use the dir argument so dbt has access to dbt_modules:
DbtRunOperator(
task_id='dbt_run',
dbt_bin='/usr/local/airflow/.local/bin/dbt',
profiles_dir='/usr/local/airflow/dags/{DBT_FOLDER}/',
dir='/usr/local/airflow/dags/{DBT_FOLDER}/'
)
Hopefully you can replicate the same idea for your dbt packages.
bit of a different question but related to the using airflow-dbt with MWAA: i can't get your example to run, or more that manner the example in the airflow-dbt/README.md on MWAA.
Ithink the folder where the code runs in MWAA is non-writable?. see AWS MWAA docs on using dbt
I can get my dag to run locally using the following setup and code:
environment (python 3.10) in docker container using celery_executor
apache-airflow==2.4.3
dbt-core==1.3.2
airflow_dbt=0.4.0
Code:
...
default_args = dict(
dbt_bin="dbt"",
profiles_dir="/opt/airflow/dags/dbt",
dir="/opt/airflow/dags/dbt"
)
with DAG(
start_date=datetime(2022, 3, 14),
schedule_interval="@once",
dag_id="dbt_test",
default_args=default_args,
tags=["dbt", "development"],
) as dag:
dbt_run = DbtRunOperator(
task_id="dbt_run_royalty",
models="+marts-report_team-royalty",
target="dev-sandervd",
)
but if i run the similar setup in MWAA i get an error regarding writing logs. I use different default_args like so (required for MWAA)
default_args = dict(
dbt_bin='/usr/local/airflow/.local/bin/dbt',
profiles_dir='/usr/local/airflow/dags/dbt/',
dir='/usr/local/airflow/dags/dbt/'
)
I get the following error
*** Reading remote log from Cloudwatch log_group: airflow-airflow-dev-sandervd-Task log_stream: dag_id=dev_dbt/run_id=manual__2023-02-22T08_27_50.474777+00_00/task_id=dbt_run_royalty/attempt=1.log.
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1165}} INFO - Dependencies all met for <TaskInstance: dev_dbt.dbt_run_royalty manual__2023-02-22T08:27:50.474777+00:00 [queued]>
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1165}} INFO - Dependencies all met for <TaskInstance: dev_dbt.dbt_run_royalty manual__2023-02-22T08:27:50.474777+00:00 [queued]>
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1362}} INFO -
--------------------------------------------------------------------------------
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1363}} INFO - Starting attempt 1 of 1
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1364}} INFO -
--------------------------------------------------------------------------------
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1383}} INFO - Executing <Task(DbtRunOperator): dbt_run_royalty> on 2023-02-22 08:27:50.474777+00:00
[2023-02-22, 09:27:59 CET] {{standard_task_runner.py:55}} INFO - Started process 2349 to run task
[2023-02-22, 09:27:59 CET] {{standard_task_runner.py:82}} INFO - Running: ['airflow', 'tasks', 'run', 'dev_dbt', 'dbt_run_royalty', 'manual__2023-02-22T08:27:50.474777+00:00', '--job-id', '40', '--raw', '--subdir', 'DAGS_FOLDER/merlin_dags/dbt_dags/dag_dev_dbt.py', '--cfg-path', '/tmp/tmp3xtcgs41']
[2023-02-22, 09:27:59 CET] {{standard_task_runner.py:83}} INFO - Job 40: Subtask dbt_run_royalty
[2023-02-22, 09:27:59 CET] {{task_command.py:376}} INFO - Running <TaskInstance: dev_dbt.dbt_run_royalty manual__2023-02-22T08:27:50.474777+00:00 [running]> on host b7f1bf7e73bf
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1590}} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=dev_dbt
AIRFLOW_CTX_TASK_ID=dbt_run_royalty
AIRFLOW_CTX_EXECUTION_DATE=2023-02-22T08:27:50.474777+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-02-22T08:27:50.474777+00:00
[2023-02-22, 09:27:59 CET] {{dbt_hook.py:117}} INFO - /usr/local/airflow/.local/bin/dbt run --profiles-dir /usr/local/airflow/dags/dbt --target dev-sandervd --models +marts-report_team-royalty
[2023-02-22, 09:27:59 CET] {{dbt_hook.py:126}} INFO - Output:
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - [0m08:28:02.215321 [error] [MainThread]: Encountered an error:
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - [0m08:28:02 Encountered an error:
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - [0m08:28:02.218674 [error] [MainThread]: Traceback (most recent call last):
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 135, in main
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - results, succeeded = handle_and_check(args)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 198, in handle_and_check
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - task, res = run_from_args(parsed)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 234, in run_from_args
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - setup_event_logger(log_path or "logs", level_override)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/events/functions.py", line 81, in setup_event_logger
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - file_handler = RotatingFileHandler(
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/lib/python3.10/logging/handlers.py", line 155, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - BaseRotatingHandler.__init__(self, filename, mode, encoding=encoding,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/lib/python3.10/logging/handlers.py", line 58, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - logging.FileHandler.__init__(self, filename, mode=mode,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/lib/python3.10/logging/__init__.py", line 1169, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - StreamHandler.__init__(self, self._open())
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/lib/python3.10/logging/__init__.py", line 1201, in _open
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - return open_func(self.baseFilename, self.mode,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - PermissionError: [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING - /usr/local/airflow/.local/lib/python3.10/site-packages/watchtower/__init__.py:349 WatchtowerWarning: Received empty message. Empty messages cannot be sent to CloudWatch Logs
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING - Traceback (most recent call last):
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING - File "/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
self.sniff_errors(record)
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING - File "/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
if pattern.search(record.message):
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING - AttributeError: 'LogRecord' object has no attribute 'message'
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - [0m08:28:02 Traceback (most recent call last):
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 135, in main
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - results, succeeded = handle_and_check(args)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 198, in handle_and_check
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - task, res = run_from_args(parsed)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 234, in run_from_args
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - setup_event_logger(log_path or "logs", level_override)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/events/functions.py", line 81, in setup_event_logger
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - file_handler = RotatingFileHandler(
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/lib/python3.10/logging/handlers.py", line 155, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - BaseRotatingHandler.__init__(self, filename, mode, encoding=encoding,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/lib/python3.10/logging/handlers.py", line 58, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - logging.FileHandler.__init__(self, filename, mode=mode,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/lib/python3.10/logging/__init__.py", line 1169, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - StreamHandler.__init__(self, self._open())
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - File "/usr/lib/python3.10/logging/__init__.py", line 1201, in _open
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - return open_func(self.baseFilename, self.mode,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - PermissionError: [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:132}} INFO - Command exited with return code 2
[2023-02-22, 09:28:02 CET] {{taskinstance.py:1851}} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt/operators/dbt_operator.py", line 98, in execute
self.create_hook().run_cli('run')
File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt/hooks/dbt_hook.py", line 138, in run_cli
raise AirflowException("dbt command failed")
airflow.exceptions.AirflowException: dbt command failed
[2023-02-22, 09:28:02 CET] {{taskinstance.py:1401}} INFO - Marking task as FAILED. dag_id=dev_dbt, task_id=dbt_run_royalty, execution_date=20230222T082750, start_date=20230222T082759, end_date=20230222T082802
[2023-02-22, 09:28:02 CET] {{standard_task_runner.py:100}} ERROR - Failed to execute job 40 for task dbt_run_royalty (dbt command failed; 2349)
[2023-02-22, 09:28:02 CET] {{local_task_job.py:159}} INFO - Task exited with return code 1
[2023-02-22, 09:28:02 CET] {{taskinstance.py:2623}} INFO - 0 downstream tasks scheduled from follow-on schedule check
I can get it to run in MWAA if i run it in the tmp dir which seems writable (as per AWS MWAA's example) using the setup below, however in that way i can't use the airflow-dbt operators anymore. Any suggestions?
cli_command = BashOperator(
task_id="dbt_run_mwaa",
bash_command=f"cp -R /usr/local/airflow/dags/dbt/ /tmp;\
cd /tmp/dbt;\
/usr/local/airflow/.local/bin/dbt run --project-dir /tmp/dbt/ -s incoming-royalty_etl-alibaba;\
",
)
@sandervandorsten it looks like it might be an issue with your dbt_project.yml configuration.
In order to get around [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log' you need to update your config in dbt_project.yml to include:
target-path: "/tmp/dbt/target" # https://github.com/gocardless/airflow-dbt/issues/33 log-path: "/tmp/dbt/logs"
@sandervandorsten it looks like it might be an issue with your
dbt_project.ymlconfiguration.In order to get around
[Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'you need to update your config indbt_project.ymlto include:
target-path: "/tmp/dbt/target" # https://github.com/gocardless/airflow-dbt/issues/33 log-path: "/tmp/dbt/logs"
Thanks! I indeed changed my dbt_project.yml files, works as a charm! for completeness
[other stuff]
...
# Changed folders that dbt writes to during runtime
# because with AWS MWAA the worker node's directory is non-writable
# # https://github.com/gocardless/airflow-dbt/issues/33
packages-install-path: "/tmp/dbt/dbt_packages"
log-path: "/tmp/dbt/logs"
target-path: "/tmp/dbt/target"
...
[other stuff]