AmpliconSuite-pipeline
AmpliconSuite-pipeline copied to clipboard
Docker permissions
Just pulled PAA down the other day and have running it, my run command is:
/data/PrepareAA/docker/run_paa_docker.py -o /data/output -s Colo -t 16 --bam /data/Data/Colo/cofinal.bam --run_AA --run_AC
however after 22+ hours i get to this point and if fails miserably:
/home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes
Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in
I am unsure where the feature_to_graph.txt should be found and the Colo_amplicon_classification_profiles.tsv doesnt seem to be getting generated.
Any assistance would be appreciated
Hi,
I have updated PrepareAA to handle issues related to permissions of the output directory in 580f923 and also consolidate a file from AmpliconClassifier that may be trying to write to a location not in nessarily in that same spot. Can you please pull the latest version of the docker image and try again? You may already have done so, but also please double check that the location you are hoping to save data to exists and has write permissions for root.
Thanks, Jens
I made a change to the run file so when you execute it, it looks like this
docker run -u id -u $USER
:id -g $USER
--rm -e AA_DATA_REPO=/home/data_repo -e argstring="$argstring" -v $AA_DATA_REPO:/home/data_repo -v /data/Data/Colo:/home/bam_dir -v /data/Data/Colo:/home/norm_bam_dir -v :/home/bed_dir -v /data/output:/home/output -v /data/mosek/8/licenses:/home/programs/mosek/8/licenses jluebeck/prepareaa bash /home/run_paa_script.sh
So everything should be read and written as the enduser running the app.
I will pull down the update(s) and give it a shot. Thank you.
Pulled and running, it will take over 20hours but i will let you know. Thank you for your time.
I do find adding the following to the run script avoids a lot of issues, so the data is written as the caregiver and not root:
-u
id -u $USER
:id -g $USER
Thank you, this is a good suggestion, I will incorporate it.
On Thu, Jul 28, 2022, 4:17 PM MrDotOne @.***> wrote:
Pulled and running, it will take over 20hours but i will let you know. Thank you for your time.
I do find adding the following to the run script avoids a lot of issues, so the data is written as the caregiver and not root:
" -u id -u $USER:id -g $USER "
— Reply to this email directly, view it on GitHub https://github.com/jluebeck/PrepareAA/issues/25#issuecomment-1198715797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADM3Q43O4XBY5NPJ2CEIIU3VWMIHXANCNFSM546XOEXA . You are receiving this because you commented.Message ID: @.***>
Someone on another repo suggested it, when i was having issues with the results being written as root and the person running it didnt have escalation privileges. I thought i would pass on that nugget.
I am still having issues
[root:INFO] #TIME 79252.045 Plotting SV View for amplicon7 [root:INFO] #TIME 79318.830 Total Runtime /home/programs/AmpliconClassifier-main/make_input.sh: line 6: scf.txt: Permission denied grep: write error: Broken pipe /home/programs/AmpliconClassifier-main/make_input.sh: line 7: sgf.txt: Permission denied find: 'standard output': Broken pipe find: write error /home/programs/AmpliconClassifier-main/make_input.sh: line 8: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: sgf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: [: : integer expression expected cat: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 12: san.txt: Permission denied paste: san.txt: No such file or directory rm: cannot remove 'san.txt': No such file or directory rm: cannot remove 'scf.txt': No such file or directory rm: cannot remove 'sgf.txt': No such file or directory AmpliconClassifier 0.4.9 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes
Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in
Matched /home/bam_dir/cofinal.bam to reference genome GRCh38 Running PrepareAA on sample: Colo
Running CNVKit batch python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d /home/output/Colo_cnvkit_output/ /home/bam_dir/cofinal.bam
Running CNVKit segment python3 /home/programs/cnvkit.py segment /home/output/Colo_cnvkit_output/cofinal.cnr -p 16 -m cbs -o /home/output/Colo_cnvkit_output/cofinal.cns
Cleaning up temporary files rm /home/output/Colo_cnvkit_output//tmp.bed /home/output/Colo_cnvkit_output//.cnn gzip /home/output/Colo_cnvkit_output/cofinal.cnr
Running amplified_intervals python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref GRCh38 --bed /home/output/Colo_cnvkit_output/cofinal_CNV_GAIN.bed --bam /home/bam_dir/cofinal.bam --gain 4.5 --cnsize_min 50000 --out /home/output/Colo_AA_CNV_SEEDS python /home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py --ref GRCh38 --downsample 10.0 --bed /home/output/Colo_AA_CNV_SEEDS.bed --bam /home/bam_dir/cofinal.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /home/output//Colo_AA_results//Colo
Running AC /home/programs/AmpliconClassifier-main/make_input.sh /home/output//Colo_AA_results/ /home/output//Colo_classification/Colo python3 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity python3 /home/programs/AmpliconClassifier-main/make_results_table.py -i /home/output//Colo_classification/Colo.input --classification_file /home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv Completed
2022-07-29 21:26:25.262009
I will run as root and that should fix it but ...
OK, i reran the run as root using the run script as provided in the repo. It seems to have completed successfully. This is good progress. However, the two times i have run it with the run -u id $UID
:id $GID
it fails. I need to figure out how to get the results written as the caregiver so i dont have to intervene.
Unfortunately that is not working. The run file works fine, for root, but not for a non-escalated account. I keep getting this error when i run as a user with the id stuff in the run command
[root:INFO] #TIME 79384.895 Plotting SV View for amplicon7 [root:INFO] #TIME 79452.068 Total Runtime /home/programs/AmpliconClassifier-main/make_input.sh: line 6: scf.txt: Permission denied grep: write error: Broken pipe /home/programs/AmpliconClassifier-main/make_input.sh: line 7: sgf.txt: Permission denied find: 'standard output': Broken pipe find: write error /home/programs/AmpliconClassifier-main/make_input.sh: line 8: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: sgf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: [: : integer expression expected cat: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 12: san.txt: Permission denied paste: san.txt: No such file or directory rm: cannot remove 'san.txt': No such file or directory rm: cannot remove 'scf.txt': No such file or directory rm: cannot remove 'sgf.txt': No such file or directory AmpliconClassifier 0.4.9 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes
Traceback (most recent call last):
File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in
Matched /home/bam_dir/cofinal.bam to reference genome GRCh38 Running PrepareAA on sample: Colo
Running CNVKit batch python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d /home/output/Colo_cnvkit_output/ /home/bam_dir/cofinal.bam
Running CNVKit segment python3 /home/programs/cnvkit.py segment /home/output/Colo_cnvkit_output/cofinal.cnr -p 16 -m cbs -o /home/output/Colo_cnvkit_output/cofinal.cns
Cleaning up temporary files rm /home/output/Colo_cnvkit_output//tmp.bed /home/output/Colo_cnvkit_output//.cnn gzip /home/output/Colo_cnvkit_output/cofinal.cnr
Running amplified_intervals python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref GRCh38 --bed /home/output/Colo_cnvkit_output/cofinal_CNV_GAIN.bed --bam /home/bam_dir/cofinal.bam --gain 4.5 --cnsize_min 50000 --out /home/output/Colo_AA_CNV_SEEDS python /home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py --ref GRCh38 --downsample 10.0 --bed /home/output/Colo_AA_CNV_SEEDS.bed --bam /home/bam_dir/cofinal.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /home/output//Colo_AA_results//Colo
Running AC /home/programs/AmpliconClassifier-main/make_input.sh /home/output//Colo_AA_results/ /home/output//Colo_classification/Colo python3 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity python3 /home/programs/AmpliconClassifier-main/make_results_table.py -i /home/output//Colo_classification/Colo.input --classification_file /home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv Completed
2022-08-01 00:05:22.206630
Hi,
Thank you for sharing. I have also now done some testing on my end and it appears that assigning a custom user for the image is non-trivial and that the above proposed solution (adding -u id $UID:id $GID) does not quite work as expected. I recommend that users run with the current default settings, generating the files as root and then users can chmod or copy the relevant files later if they need non-root ownership. I do not plan to address this issue of non-root ownership in the PrepareAA generated files at this particular time, but perhaps in the future if there is a compelling reason.
Jens
Non-root users cannon chown/chgrp files., that is a serious cybersecurity concern.
Is there a way to implement a python script within the run file to do something similar to this?
(base) [root@lri-uapps-2 data]# cat chown.py import os path = "/data/output" for root, dirs, files in os.walk(path): for momo in dirs: os.chown(os.path.join(root, momo), 1035688, 1001025) for momo in files: os.chown(os.path.join(root, momo), 1035688, 1001025)
Michael
Hi Michael,
Without re-assigning user IDs inside the container itself or alternatively sharing the /etc/passwd file from the host machine with the docker image, there is no way to provide the docker image with exact same user ids account/group information of the host machine. The previously proposed solution runs the image as a specific user inside the image, but that user is not mapped to the same user on the host machine. Perhaps one option is instead to have the docker script recursively chmod to add global read/write permissions on all the files written by the image into the mounted directory when it is finished. Would this solution be satisfactory for you? I can test this out in the next couple of days.
Jens
That is a solution i am trying to implement. I tried to use /home/output however the result was no such file or directory.
I just pulled [fc3b5e8] and will give a try with the --run_as_user option which looks promising already:
docker run --rm -e HOST_UID=$(id -u) -e HOST_GID=$(id -g) -u $(id -u):$(id -g) -e AA_DATA_REPO=/home/data_repo -e argstring="$argstring" -v $AA_DATA_REPO:/home/data_repo -v /data/Data/Colo:/home/bam_dir -v /data/Data/Colo:/home/norm_bam_dir -v /home/bendahm:/home/bed_dir -v /data/output:/home/output -v /data/mosek/8/licenses:/home/programs/mosek/8/licenses jluebeck/prepareaa bash /home/run_paa_script.sh
I will let you know what i find. Thank you for looking into this
This is perfect:
(base) [root@lri-uapps-2 data]# cd output (base) [root@lri-uapps-2 output]# ls -la total 20 drwxrwxrwx 3 bendahm ccdomainusers 113 Aug 5 15:03 . drwxrwxrwx 19 root root 4096 Aug 5 15:02 .. drwxr-xr-x 2 bendahm ccdomainusers 126 Aug 5 15:10 Colo_cnvkit_output -rw-r--r-- 1 bendahm ccdomainusers 0 Aug 5 15:03 Colo_timing_log.txt -rw-r--r-- 1 bendahm ccdomainusers 1931 Aug 5 15:03 docker_home_manifest.log -rw-r--r-- 1 bendahm ccdomainusers 11525 Aug 5 15:10 PAA_stdout.log
Glad to hear it is working for you. Reopening issue for others who may run in to issues despite this fix. I will note that this solution works as long as the docker daemon is configured to not offset UIDs and GIDs, which is sometimes done to improve security of the host machine. More info about the docker namespace remapping is available here: https://docs.oracle.com/cd/E37670_01/E75728/html/ol-docker-userns-remap.html.
Jens
Thank you for the fixes and the link, i will check it out. There are a couple other repos like this that could use this technique. Unfortunately, we may be in research here, but this is not academia, and we lock stuff down pretty tightly. Sometimes to the point where things are unusable. This was of great benefit. Thank you.