TESK icon indicating copy to clipboard operation
TESK copied to clipboard

PureFTPd fails with some workflows

Open lvarin opened this issue 4 years ago • 5 comments

It is not clear why this happens, or which component (WES, TESK, or FTP) is at fault, but when trying to run the RD workflow using a pureFTPd server, it fails.

I will run the workflow again and get more log data. For the time being, crawling through slack I found few of the errors:

jlcwl pure-ftpd: ([email protected]) [NOTICE] /mnt/data/input//cae43838-6e57-4c03-952b-fd49be6383fa/output_15f17c58-0771-4b90-a9f0-45cf8238d213/dbsnp_138.b37.vcf.gz uploaded  (1472953420 bytes, 38655.42KB/sec)
jlcwl pure-ftpd: ([email protected]) [ERROR] Can't open cwl.output.json: No such file or directory
jlcwl pure-ftpd: ([email protected]) [INFO] Can't change directory to cwl.output.json: No such file or directory
jlcwl pure-ftpd: ([email protected]) [INFO] Can't change directory to //cae43838-6e57-4c03-952b-fd49be6383fa/output_15f17c58-0771-4b90-a9f0-45cf8238d213/cwl.output.json: No such file or directory
schema_salad.validate.ValidationException: location \"ftp://195.148.31.210//cae43838-6e57-4c03-952b-fd49be6383fa/output_1a0cdc01-c336-4d73-94ee-e5ea8f1701c1/\" ends with \"/\" but is not a Directory

lvarin avatar Jul 01 '20 11:07 lvarin

Here is what we got for a failed RD workflow:

5BUCPE.log 5BUCPE.json.txt

This is just one one of the tasks, I can send all the tasks if needed:

task-84008576-lkn67.log task-84008576-outputs-filer-n5zm5.log task-84008576-inputs-filer-lgnsd.log task-84008576-ex-00-2jrn5.log

lvarin avatar Jul 02 '20 07:07 lvarin

I forgot to mention that the file is indeed uploaded to the FTP server:|

[cloud-user@ecp-cla WES-cli]$ lftp vm1976.kaj.pouta.csc.fi
lftp [email protected]:~> ls c4                 
c44bad4a-adce-4aed-a2f9-904b334ab283/  c47f4ac9-0a55-4035-bef0-eb2f2745493e/
lftp [email protected]:~> ls c47f4ac9-0a55-4035-bef0-eb2f2745493e/
drwxr-xr-x    6 1001       jarno             210 Jul  2 07:35 .
drwxr-xr-x    6 1001       jarno             210 Jul  2 07:35 ..
drwxr-xr-x    2 1001       jarno              58 Jul  2 07:34 output_3dc07dbe-ff84-482a-8029-3684bdc77fd4
drwxr-xr-x    2 1001       jarno              26 Jul  2 07:35 output_4c3d5a87-c991-47bd-bf4b-651ab6ab3d71
drwxr-xr-x    2 1001       jarno              34 Jul  2 07:35 output_65a92b8b-6d23-4e97-bc98-90141e5019dc
drwxr-xr-x    2 1001       jarno              84 Jul  2 07:35 output_fa71aec4-b7da-4cc7-a43f-6e5d6bc013b4
lftp [email protected]:/> ls c47f4ac9-0a55-4035-bef0-eb2f2745493e/output_4c3d5a87-c991-47bd-bf4b-651ab6ab3d71/
drwxr-xr-x    2 1001       jarno              26 Jul  2 07:35 .
drwxr-xr-x    2 1001       jarno              26 Jul  2 07:35 ..
-rw-r--r--    1 1001       jarno       892326179 Jul  2 07:35 hs37d5.fa.gz

lvarin avatar Jul 02 '20 07:07 lvarin

[cloud-user@jlcwl ~]$ sudo grep c47f4ac9-0a55-4035-bef0-eb2f2745493e /var/log/pureftpd.log
195.148.30.238 - input [02/Jul/2020:07:34:42 -0000] "PUT /mnt/data/input/c47f4ac9-0a55-4035-bef0-eb2f2745493e/output_3dc07dbe-ff84-482a-8029-3684bdc77fd4/Mills_and_1000G_gold_standard.indels.b37.vcf" 200 86369975
195.148.30.238 - input [02/Jul/2020:07:35:02 -0000] "PUT /mnt/data/input/c47f4ac9-0a55-4035-bef0-eb2f2745493e/output_fa71aec4-b7da-4cc7-a43f-6e5d6bc013b4/U5c_CCGTCC_L001_R1_001.fastq.gz" 200 487611787
195.148.30.238 - input [02/Jul/2020:07:35:06 -0000] "PUT /mnt/data/input/c47f4ac9-0a55-4035-bef0-eb2f2745493e/output_fa71aec4-b7da-4cc7-a43f-6e5d6bc013b4/U5c_CCGTCC_L001_R2_001.fastq.gz" 200 546806668
195.148.30.238 - input [02/Jul/2020:07:35:13 -0000] "PUT /mnt/data/input/c47f4ac9-0a55-4035-bef0-eb2f2745493e/output_4c3d5a87-c991-47bd-bf4b-651ab6ab3d71/hs37d5.fa.gz" 200 892326179
195.148.30.238 - input [02/Jul/2020:07:35:38 -0000] "PUT /mnt/data/input/c47f4ac9-0a55-4035-bef0-eb2f2745493e/output_65a92b8b-6d23-4e97-bc98-90141e5019dc/dbsnp_138.b37.vcf.gz" 200 1472953420

lvarin avatar Jul 02 '20 09:07 lvarin

I forgot the way to reproduce this is:

WES URL: csc-wes.c03.k8s-popup.csc.fi

workflow_type = cwl workflow_type_version = v1.0 workflow_url = https://github.com/jarnolaitinen/RD_pipeline/blob/master/workflow.cwl

Input {"curl_fastq_urls":{"class":"File","path":"http://195.148.30.67:8000/fastq_files_urls.txt"},"curl_reference_genome_url":{"class":"File","path":"http://195.148.30.67:8000/reference_seq_url.txt"},"curl_known_indels_url":{"class":"File","path":"http://195.148.30.67:8000/known_indels_url.txt"},"curl_known_sites_url":{"class":"File","path":"http://195.148.30.67:8000/known_sites_url.txt"},"readgroup_str":"@RG\tID:Seq01p\tSM:Seq01\tPL:ILLUMINA\tPI:330","sample_name":"abc1","threads":"10","gqb":[20,25,30,35,40,45,50,70,90,99]}

lvarin avatar Jul 03 '20 09:07 lvarin

I copied the inpit files to: ftp://ftp-private.ebi.ac.uk:/upload/RD-files/

lvarin avatar Jul 08 '20 11:07 lvarin