AmpliconSuite-pipeline Docker permissions

Just pulled PAA down the other day and have running it, my run command is:

/data/PrepareAA/docker/run_paa_docker.py -o /data/output -s Colo -t 16 --bam /data/Data/Colo/cofinal.bam --run_AA --run_AC

however after 22+ hours i get to this point and if fails miserably:

/home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes

Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in f2gf = open("feature_to_graph.txt", 'w') PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt' Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in with open(args.input) as input_file, open(args.classification_file) as classification_file: FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv' 2022-07-27 22:49:31.494158

I am unsure where the feature_to_graph.txt should be found and the Colo_amplicon_classification_profiles.tsv doesnt seem to be getting generated.

Any assistance would be appreciated

Jul 28 '22 21:07 MrDotOne

Hi,

I have updated PrepareAA to handle issues related to permissions of the output directory in 580f923 and also consolidate a file from AmpliconClassifier that may be trying to write to a location not in nessarily in that same spot. Can you please pull the latest version of the docker image and try again? You may already have done so, but also please double check that the location you are hoping to save data to exists and has write permissions for root.

Thanks, Jens

Jul 28 '22 22:07 jluebeck

I made a change to the run file so when you execute it, it looks like this

docker run -u id -u $USER:id -g $USER --rm -e AA_DATA_REPO=/home/data_repo -e argstring="$argstring" -v $AA_DATA_REPO:/home/data_repo -v /data/Data/Colo:/home/bam_dir -v /data/Data/Colo:/home/norm_bam_dir -v :/home/bed_dir -v /data/output:/home/output -v /data/mosek/8/licenses:/home/programs/mosek/8/licenses jluebeck/prepareaa bash /home/run_paa_script.sh

So everything should be read and written as the enduser running the app.

I will pull down the update(s) and give it a shot. Thank you.

Jul 28 '22 23:07 MrDotOne

Pulled and running, it will take over 20hours but i will let you know. Thank you for your time.

I do find adding the following to the run script avoids a lot of issues, so the data is written as the caregiver and not root:

-u id -u $USER:id -g $USER

Jul 28 '22 23:07 MrDotOne

Thank you, this is a good suggestion, I will incorporate it.

On Thu, Jul 28, 2022, 4:17 PM MrDotOne @.***> wrote:

Pulled and running, it will take over 20hours but i will let you know. Thank you for your time.

I do find adding the following to the run script avoids a lot of issues, so the data is written as the caregiver and not root:

" -u id -u $USER:id -g $USER "

— Reply to this email directly, view it on GitHub https://github.com/jluebeck/PrepareAA/issues/25#issuecomment-1198715797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADM3Q43O4XBY5NPJ2CEIIU3VWMIHXANCNFSM546XOEXA . You are receiving this because you commented.Message ID: @.***>

Jul 28 '22 23:07 jluebeck

Someone on another repo suggested it, when i was having issues with the results being written as root and the person running it didnt have escalation privileges. I thought i would pass on that nugget.

Jul 29 '22 01:07 MrDotOne

I am still having issues

[root:INFO] #TIME 79252.045 Plotting SV View for amplicon7 [root:INFO] #TIME 79318.830 Total Runtime /home/programs/AmpliconClassifier-main/make_input.sh: line 6: scf.txt: Permission denied grep: write error: Broken pipe /home/programs/AmpliconClassifier-main/make_input.sh: line 7: sgf.txt: Permission denied find: 'standard output': Broken pipe find: write error /home/programs/AmpliconClassifier-main/make_input.sh: line 8: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: sgf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: [: : integer expression expected cat: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 12: san.txt: Permission denied paste: san.txt: No such file or directory rm: cannot remove 'san.txt': No such file or directory rm: cannot remove 'scf.txt': No such file or directory rm: cannot remove 'sgf.txt': No such file or directory AmpliconClassifier 0.4.9 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes

Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in f2gf = open("feature_to_graph.txt", 'w') PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt' Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in with open(args.input) as input_file, open(args.classification_file) as classification_file: FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv' 2022-07-28 23:07:27.730295 PrepareAA version 0.1203.1

Matched /home/bam_dir/cofinal.bam to reference genome GRCh38 Running PrepareAA on sample: Colo

Running CNVKit batch python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d /home/output/Colo_cnvkit_output/ /home/bam_dir/cofinal.bam

Running CNVKit segment python3 /home/programs/cnvkit.py segment /home/output/Colo_cnvkit_output/cofinal.cnr -p 16 -m cbs -o /home/output/Colo_cnvkit_output/cofinal.cns

Cleaning up temporary files rm /home/output/Colo_cnvkit_output//tmp.bed /home/output/Colo_cnvkit_output//.cnn gzip /home/output/Colo_cnvkit_output/cofinal.cnr

Running amplified_intervals python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref GRCh38 --bed /home/output/Colo_cnvkit_output/cofinal_CNV_GAIN.bed --bam /home/bam_dir/cofinal.bam --gain 4.5 --cnsize_min 50000 --out /home/output/Colo_AA_CNV_SEEDS python /home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py --ref GRCh38 --downsample 10.0 --bed /home/output/Colo_AA_CNV_SEEDS.bed --bam /home/bam_dir/cofinal.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /home/output//Colo_AA_results//Colo

Running AC /home/programs/AmpliconClassifier-main/make_input.sh /home/output//Colo_AA_results/ /home/output//Colo_classification/Colo python3 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity python3 /home/programs/AmpliconClassifier-main/make_results_table.py -i /home/output//Colo_classification/Colo.input --classification_file /home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv Completed

2022-07-29 21:26:25.262009

Jul 30 '22 02:07 MrDotOne

I will run as root and that should fix it but ...

Jul 30 '22 03:07 MrDotOne

OK, i reran the run as root using the run script as provided in the repo. It seems to have completed successfully. This is good progress. However, the two times i have run it with the run -u id $UID:id $GID it fails. I need to figure out how to get the results written as the caregiver so i dont have to intervene.

Jul 31 '22 01:07 MrDotOne

Unfortunately that is not working. The run file works fine, for root, but not for a non-escalated account. I keep getting this error when i run as a user with the id stuff in the run command

[root:INFO] #TIME 79384.895 Plotting SV View for amplicon7 [root:INFO] #TIME 79452.068 Total Runtime /home/programs/AmpliconClassifier-main/make_input.sh: line 6: scf.txt: Permission denied grep: write error: Broken pipe /home/programs/AmpliconClassifier-main/make_input.sh: line 7: sgf.txt: Permission denied find: 'standard output': Broken pipe find: write error /home/programs/AmpliconClassifier-main/make_input.sh: line 8: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: sgf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: [: : integer expression expected cat: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 12: san.txt: Permission denied paste: san.txt: No such file or directory rm: cannot remove 'san.txt': No such file or directory rm: cannot remove 'scf.txt': No such file or directory rm: cannot remove 'sgf.txt': No such file or directory AmpliconClassifier 0.4.9 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes

Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in f2gf = open("feature_to_graph.txt", 'w') PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt' Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in with open(args.input) as input_file, open(args.classification_file) as classification_file: FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv' 2022-07-31 01:44:32.102144 PrepareAA version 0.1203.1

Matched /home/bam_dir/cofinal.bam to reference genome GRCh38 Running PrepareAA on sample: Colo

Running CNVKit batch python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d /home/output/Colo_cnvkit_output/ /home/bam_dir/cofinal.bam

Running CNVKit segment python3 /home/programs/cnvkit.py segment /home/output/Colo_cnvkit_output/cofinal.cnr -p 16 -m cbs -o /home/output/Colo_cnvkit_output/cofinal.cns

Cleaning up temporary files rm /home/output/Colo_cnvkit_output//tmp.bed /home/output/Colo_cnvkit_output//.cnn gzip /home/output/Colo_cnvkit_output/cofinal.cnr

Running amplified_intervals python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref GRCh38 --bed /home/output/Colo_cnvkit_output/cofinal_CNV_GAIN.bed --bam /home/bam_dir/cofinal.bam --gain 4.5 --cnsize_min 50000 --out /home/output/Colo_AA_CNV_SEEDS python /home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py --ref GRCh38 --downsample 10.0 --bed /home/output/Colo_AA_CNV_SEEDS.bed --bam /home/bam_dir/cofinal.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /home/output//Colo_AA_results//Colo

Running AC /home/programs/AmpliconClassifier-main/make_input.sh /home/output//Colo_AA_results/ /home/output//Colo_classification/Colo python3 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity python3 /home/programs/AmpliconClassifier-main/make_results_table.py -i /home/output//Colo_classification/Colo.input --classification_file /home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv Completed

2022-08-01 00:05:22.206630

Aug 01 '22 00:08 MrDotOne

Hi,

Thank you for sharing. I have also now done some testing on my end and it appears that assigning a custom user for the image is non-trivial and that the above proposed solution (adding -u id $UID:id $GID) does not quite work as expected. I recommend that users run with the current default settings, generating the files as root and then users can chmod or copy the relevant files later if they need non-root ownership. I do not plan to address this issue of non-root ownership in the PrepareAA generated files at this particular time, but perhaps in the future if there is a compelling reason.

Jens

Aug 01 '22 23:08 jluebeck

Non-root users cannon chown/chgrp files., that is a serious cybersecurity concern.

Aug 02 '22 18:08 MrDotOne

Is there a way to implement a python script within the run file to do something similar to this?

(base) [root@lri-uapps-2 data]# cat chown.py import os path = "/data/output" for root, dirs, files in os.walk(path): for momo in dirs: os.chown(os.path.join(root, momo), 1035688, 1001025) for momo in files: os.chown(os.path.join(root, momo), 1035688, 1001025)

Michael

Aug 02 '22 19:08 MrDotOne

Hi Michael,

Without re-assigning user IDs inside the container itself or alternatively sharing the /etc/passwd file from the host machine with the docker image, there is no way to provide the docker image with exact same user ids account/group information of the host machine. The previously proposed solution runs the image as a specific user inside the image, but that user is not mapped to the same user on the host machine. Perhaps one option is instead to have the docker script recursively chmod to add global read/write permissions on all the files written by the image into the mounted directory when it is finished. Would this solution be satisfactory for you? I can test this out in the next couple of days.

Jens

Aug 02 '22 19:08 jluebeck

That is a solution i am trying to implement. I tried to use /home/output however the result was no such file or directory.

Aug 02 '22 22:08 MrDotOne

I just pulled [fc3b5e8] and will give a try with the --run_as_user option which looks promising already:

docker run --rm -e HOST_UID=$(id -u) -e HOST_GID=$(id -g) -u $(id -u):$(id -g) -e AA_DATA_REPO=/home/data_repo -e argstring="$argstring" -v $AA_DATA_REPO:/home/data_repo -v /data/Data/Colo:/home/bam_dir -v /data/Data/Colo:/home/norm_bam_dir -v /home/bendahm:/home/bed_dir -v /data/output:/home/output -v /data/mosek/8/licenses:/home/programs/mosek/8/licenses jluebeck/prepareaa bash /home/run_paa_script.sh

I will let you know what i find. Thank you for looking into this

Aug 05 '22 19:08 MrDotOne

This is perfect:

(base) [root@lri-uapps-2 data]# cd output (base) [root@lri-uapps-2 output]# ls -la total 20 drwxrwxrwx 3 bendahm ccdomainusers 113 Aug 5 15:03 . drwxrwxrwx 19 root root 4096 Aug 5 15:02 .. drwxr-xr-x 2 bendahm ccdomainusers 126 Aug 5 15:10 Colo_cnvkit_output -rw-r--r-- 1 bendahm ccdomainusers 0 Aug 5 15:03 Colo_timing_log.txt -rw-r--r-- 1 bendahm ccdomainusers 1931 Aug 5 15:03 docker_home_manifest.log -rw-r--r-- 1 bendahm ccdomainusers 11525 Aug 5 15:10 PAA_stdout.log

Aug 05 '22 19:08 MrDotOne

Glad to hear it is working for you. Reopening issue for others who may run in to issues despite this fix. I will note that this solution works as long as the docker daemon is configured to not offset UIDs and GIDs, which is sometimes done to improve security of the host machine. More info about the docker namespace remapping is available here: https://docs.oracle.com/cd/E37670_01/E75728/html/ol-docker-userns-remap.html.

Jens

Aug 05 '22 21:08 jluebeck

Thank you for the fixes and the link, i will check it out. There are a couple other repos like this that could use this technique. Unfortunately, we may be in research here, but this is not academia, and we lock stuff down pretty tightly. Sometimes to the point where things are unusable. This was of great benefit. Thank you.

Aug 05 '22 21:08 MrDotOne

AmpliconSuite-pipeline AmpliconSuite-pipeline copied to clipboard

Docker permissions

AmpliconSuite-pipeline
AmpliconSuite-pipeline copied to clipboard