airflow-dbt icon indicating copy to clipboard operation
airflow-dbt copied to clipboard

I have problem with dbt_hook to write into logs permission denied

Open ravog opened this issue 3 years ago • 18 comments

ravog avatar Apr 19 '21 19:04 ravog

Hi @ravog,

This operator just logs as all other Airflow tasks, and doesn't try to write any other logs in any other location.

Are your other Airflow tasks writing logs ok? Is it just this task/operator?

Regards, Andrew

andrewrjones avatar Apr 20 '21 08:04 andrewrjones

The author might be referring to the error message I've just run into when using airflow-dbt with AWS MWAA:

[2021-08-05 15:05:37,950] {{dbt_hook.py:109}} INFO - Output:
[2021-08-05 15:05:40,340] {{dbt_hook.py:113}} INFO - Running with dbt=0.20.0
[2021-08-05 15:05:40,594] {{dbt_hook.py:113}} INFO - Encountered an error:
[2021-08-05 15:05:40,617] {{dbt_hook.py:113}} INFO - [Errno 13] Permission denied: 'logs/dbt.log'
[2021-08-05 15:05:40,722] {{dbt_hook.py:117}} INFO - Command exited with return code 2
[2021-08-05 15:05:40,759] {{taskinstance.py:1150}} ERROR - dbt command failed```

dkrylovsb avatar Aug 05 '21 15:08 dkrylovsb

I had to update the value of log-path in my dbt_project.yml (https://docs.getdbt.com/reference/project-configs/log-path) with something like /usr/local/airflow/tmp/logs in order to run on AWS MWAA.

Falydoor avatar Sep 02 '21 19:09 Falydoor

Hi ,

I was able to fix the permission denied error as @Falydoor has suggested but i am getting readonly file system on writing partial parsing:

[2021-09-07 00:32:19,011] {{dbt_hook.py:117}} INFO - /usr/local/airflow/.local/bin/dbt run --profiles-dir /usr/local/airflow/dags/dbt1/
[2021-09-07 00:32:19,045] {{dbt_hook.py:126}} INFO - Output:
[2021-09-07 00:32:20,899] {{dbt_hook.py:130}} INFO - Running with dbt=0.20.1
[2021-09-07 00:32:22,972] {{dbt_hook.py:130}} INFO - Encountered an error:
[2021-09-07 00:32:23,184] {{dbt_hook.py:130}} INFO - [Errno 30] Read-only file system: 'target/partial_parse.msgpack'
[2021-09-07 00:32:23,214] {{dbt_hook.py:134}} INFO - Command exited with return code 2

Please let me know if anyone has an answer for this.

prakash260 avatar Sep 07 '21 00:09 prakash260

Hey @prakash260,

Try updating the target-path property too (https://docs.getdbt.com/reference/project-configs/target-path) with /usr/local/airflow/tmp/target for example.

Maybe there is a better way rather than using a temp folder like disabling dbt logs/target generation.

Falydoor avatar Sep 07 '21 13:09 Falydoor

thanks @Falydoor i too disagree on temp usage too much but i will see whether it will work or not.

prakash260 avatar Sep 07 '21 22:09 prakash260

ok i have tried replacing the location to /usr/local/airflow/dags/{dbt-directory} and everything is working now

prakash260 avatar Sep 08 '21 00:09 prakash260

@prakash260 could you please elaborate more on {dbt_directory} ? I tried to use in my case: target-path: "/usr/local/airflow/dags/dbt_target

but no success. Does MWAA has write access to /usr/local/airflow/dags/ ? I read in AWS documentation (https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-faqs.html#custom-image) that only temp: Your Apache Airflow Operators can store temporary data on the Workers. Apache Airflow Workers can access temporary files in the /tmp on the Fargate containers for your environment. but this is not good approach.

maker100 avatar Oct 27 '21 10:10 maker100

hey @maker100,

That particular location gets picked from S3 as part of MWAA hence i was forced stored the details in there. I did try those options and felt that using Dbt Cloud is much easier than customizing it. But that target thing did work for me for dbt core.

To be more precise your DBT project files need to be present in S3 location for this thing to work.

prakash260 avatar Oct 27 '21 22:10 prakash260

so, what is the right approach? what should be done to handle the permission issue?

Gatsby-Lee avatar Mar 18 '22 04:03 Gatsby-Lee

Hi @Gatsby-Lee ,

because of several issues with the direct use of dbt installed on MWAA like:

  • Python library issues
  • DBT issues with path parsing when the dbt models were stored in custom path
  • long time of plugins upload with MWAA - in plugins.zip were stored models

I decided to use separate environment and use dbt on AWS Batch service using ECR image. You can use also for it Kubernetes.

I recommend to use MWAA only as a scheduler and not to install dbt directly there.

maker100 avatar Mar 18 '22 08:03 maker100

Hey @Gatsby-Lee,

I agree with @maker100, you should avoid running heavy process like DBT directly on MWAA. My Airflow DAG triggers an ECS task that runs on Fargate to run my DBT code so I don't have to worry about resource allocation.

Falydoor avatar Mar 18 '22 12:03 Falydoor

@maker100 @Falydoor Hi, from my heart, I really appreciate to your comment. I didn't expect this fast reply to my question :)

I have a following question. Q1. Do you guys mean that MWAA trigger event ( or execute through Operator ) to run other AWS service like AWS Batch or Fargate? Q2. Does it mean the DBT is built as an image?

Thank you

Gatsby-Lee avatar Mar 18 '22 19:03 Gatsby-Lee

1: I used this operator to trigger my ECS task https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/ecs.html 2: Yes, I have a Dockerfile that uses an image with dbt (https://hub.docker.com/r/fishtownanalytics/dbt) and then my dbt code is copied to it.

Falydoor avatar Mar 18 '22 20:03 Falydoor

@Falydoor Thank you for your reply 👍

Gatsby-Lee avatar Mar 18 '22 21:03 Gatsby-Lee

Hello there @Falydoor ,

I have been running all the same steps along this conversation thread; and most of the errors mentioned in this thread have been happening to me. I guess there is not been updates or fixes by AWS on MWAA running DBT so far. My guess is that the best solution (less work too, compared to other options like DBT on EC2, DBT on Lambdas, etc.) would be to run DBT on ECS and invoke that from within tasks within DAGs and have a well-decoupled architecture between Airflow and the dbt environment and transformation itself. Is that what you were doing?

joaquimsage avatar Apr 19 '23 14:04 joaquimsage

Hello @joaquimsage,

Yes correct! The ECS Airflow operator can be used to run your task definition on your ECS cluster (use Fargate so you don't have to manage EC2s). One "small" drawback is that the task usually takes 1 minute to start so it delays a bit your DBT run.

About MWAA, I don't think AWS will do any updates to fix the permission/read-only issues 😬.

Falydoor avatar Apr 19 '23 15:04 Falydoor

Hi all, if you want to run dbt directly on airflow:

Please make these changes to the dbt_project.yml as only tmp directory has read-write permission in MWAA.

packages-install-path: "/usr/local/airflow/tmp/dbt_packages" log-path: "/usr/local/airflow/tmp/logs" target-path: "/usr/local/airflow/tmp/target" # directory which will store compiled SQL files clean-targets: # directories to be removed by dbt clean

  • "/usr/local/airflow/tmp/target"
  • "/usr/local/airflow/tmp/dbt_packages"

vijayscbitscrunch avatar Aug 16 '23 09:08 vijayscbitscrunch