argo-workflows
argo-workflows copied to clipboard
Argo Workflow not working with libreoffice
Summary
What happened/what you expected to happen? I am need to use libreoffice headless to convert docx file to pdf. This is working execellent in Vanilla k8s and Databricks but when i do the same in Kubeflow which uses argo workflow at its backend it does not produce any output.
What version are you running? argoproj.io/v1alpha1 Kubeflow 1.4
Diagnostics
Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: libreoffice-pv-claim
spec:
storageClassName: gp2
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
name: libreoffice
spec:
containers:
- name: libreoffice-container
image: domnulnopcea/libreoffice-headless:latest
command: ["libreoffice", "--headless", "--convert-to","pdf" ,"/tests/288.pptx","--outdir", "/tests"]
volumeMounts:
- mountPath: "/tests"
name: libreoffice-storage
volumes:
- name: libreoffice-storage
persistentVolumeClaim:
claimName: libreoffice-pv-claim
tolerations:
- key: project
operator: Equal
value: cd-msr
effect: NoSchedule
---
apiVersion: v1
kind: Pod
metadata:
name: libreoffice-bash
spec:
containers:
- name: libreoffice-container
image: ubuntu:18.04
command: ["/bin/sleep", "3650d"]
volumeMounts:
- mountPath: "/tests"
name: libreoffice-storage
volumes:
- name: libreoffice-storage
persistentVolumeClaim:
claimName: libreoffice-pv-claim
tolerations:
- key: project
operator: Equal
value: cd-msr
effect: NoSchedule
This is the yaml I am using. I am then manually copying the input files
kubectl cp ./288.pptx libreoffice-bash:/tests/
kubectl cp ./dummy.pptx libreoffice-bash:/tests/
This is working but when I tries to do the same in Kubeflow it doesn't was. The script executes without producing any output file.
import kfp
import kfp.components as components
import kfp.dsl as dsl
from kfp.components import InputPath, OutputPath
@components.create_component_from_func
def download_file(s3_folder_path,object_name):
input_file_path=s3_folder_path+"/"+object_name
import subprocess
subprocess.run('pip install boto3'.split())
# Download file
import boto3
s3=boto3.client('s3')
s3.download_file('qa-cd-msr-20220524050318415700000001', input_file_path, '/tmp/input.pptx')
print(input_file_path + " file is downloaded...Executing libreoffice conversion")
subprocess.run("ls -ltr /tmp".split())
def convert_to_pdf():
import subprocess
def exec_cmd(cmd)->(any,str):
print("Executing "+cmd)
result=subprocess.run(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout=result.stdout.decode('utf-8') + '\n'+ result.stderr.decode('utf-8')
print("stdout: "+stdout)
return stdout
exec_cmd("libreoffice --headless --convert-to pdf /files/input.pptx --outdir /files")
exec_cmd("ls -ltr /files")
convert_to_pdf_op = components.func_to_container_op(convert_to_pdf, base_image= "domnulnopcea/libreoffice-headless:latest")
@dsl.pipeline(
name="Libreoffice",
description="Libreoffice",
)
def sample_pipeline(s3_folder_path:str="/mpsr/decks", object_name:str="Adcetris_master_40.pptx"):
vop = dsl.VolumeOp(
name="create-pvc",
resource_name="my-pvc",
modes=dsl.VOLUME_MODE_RWO,
size="1Gi"
)
download = download_file(s3_folder_path,object_name).add_pvolumes({"/tmp": vop.volume})
convert = convert_to_pdf_op().add_pvolumes({"/files": download.pvolume})
convert.execution_options.caching_strategy.max_cache_staleness = "P0D"
convert.after(download)
client = kfp.Client()
experiment = client.create_experiment(
name="Libreoffice",
description="Libreoffice",
namespace="cd-msr"
)
client.create_run_from_pipeline_func(
sample_pipeline,
arguments={"s3_folder_path":"/mpsr/decks","object_name":"dummy1.pptx"},
run_name="libreoffice",
experiment_name="Libreoffice"
)
Output :

ignore the error here. I was also getting this in vanilla k8s but it gives the output there.
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
I switched to docker runtime it works there.
Docker executor is no longer supported. If this does not work with PNS executor, then this is a regression. Have you tried PNS?
Can you please upload the workflow that caused this problem.
I just tried it with PNS executor. It is succeeding with the PNS executor. I will get back to you with the argo workflow.
Output of the wf with pns executor:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
annotations:
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
pipelines.kubeflow.org/pipeline_compilation_time: 2022-07-19T10:48:45.175943
pipelines.kubeflow.org/pipeline_spec: '{"description": "Libreoffice", "inputs":
[{"default": "/mpsr/decks", "name": "s3_folder_path", "optional": true, "type":
"String"}, {"default": "Adcetris_master_288.pptx", "name": "object_name", "optional":
true, "type": "String"}], "name": "Libreoffice"}'
pipelines.kubeflow.org/run_name: libreoffice
creationTimestamp: "2022-07-19T10:48:45Z"
generateName: libreoffice-
generation: 8
labels:
pipeline/persistedFinalState: "true"
pipeline/runid: 3c88a72f-5b46-46c1-9dd9-9765971611a2
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
workflows.argoproj.io/completed: "true"
workflows.argoproj.io/phase: Succeeded
name: libreoffice-kck7k
namespace: cd-msr
resourceVersion: "198021692"
uid: 0d740423-2c54-4b2a-a55c-b0b8e546fd3f
spec:
arguments:
parameters:
- name: s3_folder_path
value: /mpsr/decks
- name: object_name
value: Adcetris_master_288.pptx
entrypoint: libreoffice
podMetadata:
labels:
pipeline/runid: 3c88a72f-5b46-46c1-9dd9-9765971611a2
serviceAccountName: default-editor
templates:
- container:
command:
- sh
- -ec
- |
program_path=$(mktemp)
printf "%s" "$0" > "$program_path"
python3 -u "$program_path" "$@"
- |
def convert_to_pdf():
import subprocess
def exec_cmd(cmd):
print("Executing "+cmd)
result=subprocess.run(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout=result.stdout.decode('utf-8') + '\n'+ result.stderr.decode('utf-8')
print("stdout: "+stdout)
return stdout
exec_cmd("libreoffice --headless --convert-to pdf /files/input.pptx --outdir /files")
exec_cmd("ls -ltr /files")
import argparse
_parser = argparse.ArgumentParser(prog='Convert to pdf', description='')
_parsed_args = vars(_parser.parse_args())
_outputs = convert_to_pdf(**_parsed_args)
image: domnulnopcea/libreoffice-headless:latest
name: ""
resources: {}
volumeMounts:
- mountPath: /files
name: create-pvc
inputs:
parameters:
- name: create-pvc-name
metadata:
annotations:
pipelines.kubeflow.org/component_ref: '{}'
pipelines.kubeflow.org/component_spec: '{"implementation": {"container": {"args":
[], "command": ["sh", "-ec", "program_path=$(mktemp)\nprintf \"%s\" \"$0\"
> \"$program_path\"\npython3 -u \"$program_path\" \"$@\"\n", "def convert_to_pdf():\n import
subprocess\n def exec_cmd(cmd):\n print(\"Executing \"+cmd)\n result=subprocess.run(cmd.split(),
stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n stdout=result.stdout.decode(''utf-8'')
+ ''\\n''+ result.stderr.decode(''utf-8'')\n print(\"stdout: \"+stdout)\n return
stdout\n exec_cmd(\"libreoffice --headless --convert-to pdf /files/input.pptx
--outdir /files\")\n exec_cmd(\"ls -ltr /files\")\n\nimport argparse\n_parser
= argparse.ArgumentParser(prog=''Convert to pdf'', description='''')\n_parsed_args
= vars(_parser.parse_args())\n\n_outputs = convert_to_pdf(**_parsed_args)\n"],
"image": "domnulnopcea/libreoffice-headless:latest"}}, "name": "Convert
to pdf"}'
pipelines.kubeflow.org/max_cache_staleness: P0D
sidecar.istio.io/inject: "false"
labels:
pipelines.kubeflow.org/cache_enabled: "true"
pipelines.kubeflow.org/enable_caching: "true"
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
pipelines.kubeflow.org/pipeline-sdk-type: kfp
name: convert-to-pdf
outputs: {}
volumes:
- name: create-pvc
persistentVolumeClaim:
claimName: '{{inputs.parameters.create-pvc-name}}'
- inputs: {}
metadata:
annotations:
sidecar.istio.io/inject: "false"
labels:
pipelines.kubeflow.org/cache_enabled: "true"
pipelines.kubeflow.org/enable_caching: "true"
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
pipelines.kubeflow.org/pipeline-sdk-type: kfp
name: create-pvc
outputs:
parameters:
- name: create-pvc-manifest
valueFrom:
jsonPath: '{}'
- name: create-pvc-name
valueFrom:
jsonPath: '{.metadata.name}'
- name: create-pvc-size
valueFrom:
jsonPath: '{.status.capacity.storage}'
resource:
action: create
manifest: |
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: '{{workflow.name}}-my-pvc'
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
- container:
args:
- --s3-folder-path
- '{{inputs.parameters.s3_folder_path}}'
- --object-name
- '{{inputs.parameters.object_name}}'
command:
- sh
- -ec
- |
program_path=$(mktemp)
printf "%s" "$0" > "$program_path"
python3 -u "$program_path" "$@"
- |
def download_file(s3_folder_path,object_name):
input_file_path=s3_folder_path+"/"+object_name
import subprocess
subprocess.run('pip install boto3'.split())
# Download file
import boto3
s3=boto3.client('s3')
s3.download_file('qa-cd-msr-20220524050318415700000001', input_file_path, '/tmp/input.pptx')
print(input_file_path + " file is downloaded...Executing libreoffice conversion")
subprocess.run("ls -ltr /tmp".split())
import argparse
_parser = argparse.ArgumentParser(prog='Download file', description='')
_parser.add_argument("--s3-folder-path", dest="s3_folder_path", type=str, required=True, default=argparse.SUPPRESS)
_parser.add_argument("--object-name", dest="object_name", type=str, required=True, default=argparse.SUPPRESS)
_parsed_args = vars(_parser.parse_args())
_outputs = download_file(**_parsed_args)
image: python:3.7
name: ""
resources: {}
volumeMounts:
- mountPath: /tmp
name: create-pvc
inputs:
parameters:
- name: create-pvc-name
- name: object_name
- name: s3_folder_path
metadata:
annotations:
pipelines.kubeflow.org/arguments.parameters: '{"object_name": "{{inputs.parameters.object_name}}",
"s3_folder_path": "{{inputs.parameters.s3_folder_path}}"}'
pipelines.kubeflow.org/component_ref: '{}'
pipelines.kubeflow.org/component_spec: '{"implementation": {"container": {"args":
["--s3-folder-path", {"inputValue": "s3_folder_path"}, "--object-name",
{"inputValue": "object_name"}], "command": ["sh", "-ec", "program_path=$(mktemp)\nprintf
\"%s\" \"$0\" > \"$program_path\"\npython3 -u \"$program_path\" \"$@\"\n",
"def download_file(s3_folder_path,object_name):\n input_file_path=s3_folder_path+\"/\"+object_name\n import
subprocess\n subprocess.run(''pip install boto3''.split())\n # Download
file\n import boto3\n s3=boto3.client(''s3'')\n s3.download_file(''qa-cd-msr-20220524050318415700000001'',
input_file_path, ''/tmp/input.pptx'')\n print(input_file_path + \" file
is downloaded...Executing libreoffice conversion\")\n subprocess.run(\"ls
-ltr /tmp\".split())\n\nimport argparse\n_parser = argparse.ArgumentParser(prog=''Download
file'', description='''')\n_parser.add_argument(\"--s3-folder-path\", dest=\"s3_folder_path\",
type=str, required=True, default=argparse.SUPPRESS)\n_parser.add_argument(\"--object-name\",
dest=\"object_name\", type=str, required=True, default=argparse.SUPPRESS)\n_parsed_args
= vars(_parser.parse_args())\n\n_outputs = download_file(**_parsed_args)\n"],
"image": "python:3.7"}}, "inputs": [{"name": "s3_folder_path"}, {"name":
"object_name"}], "name": "Download file"}'
sidecar.istio.io/inject: "false"
labels:
pipelines.kubeflow.org/cache_enabled: "true"
pipelines.kubeflow.org/enable_caching: "true"
pipelines.kubeflow.org/kfp_sdk_version: 1.8.12
pipelines.kubeflow.org/pipeline-sdk-type: kfp
name: download-file
outputs: {}
volumes:
- name: create-pvc
persistentVolumeClaim:
claimName: '{{inputs.parameters.create-pvc-name}}'
- dag:
tasks:
- arguments:
parameters:
- name: create-pvc-name
value: '{{tasks.create-pvc.outputs.parameters.create-pvc-name}}'
dependencies:
- create-pvc
- download-file
name: convert-to-pdf
template: convert-to-pdf
- arguments: {}
name: create-pvc
template: create-pvc
- arguments:
parameters:
- name: create-pvc-name
value: '{{tasks.create-pvc.outputs.parameters.create-pvc-name}}'
- name: object_name
value: '{{inputs.parameters.object_name}}'
- name: s3_folder_path
value: '{{inputs.parameters.s3_folder_path}}'
dependencies:
- create-pvc
name: download-file
template: download-file
inputs:
parameters:
- name: object_name
- name: s3_folder_path
metadata:
annotations:
sidecar.istio.io/inject: "false"
labels:
pipelines.kubeflow.org/cache_enabled: "true"
name: libreoffice
outputs: {}
status:
artifactRepositoryRef:
artifactRepository:
archiveLogs: true
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
keyFormat: artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
default: true
conditions:
- status: "False"
type: PodRunning
- status: "True"
type: Completed
finishedAt: "2022-07-19T10:58:45Z"
nodes:
libreoffice-kck7k:
children:
- libreoffice-kck7k-2249554463
displayName: libreoffice-kck7k
finishedAt: "2022-07-19T10:58:45Z"
id: libreoffice-kck7k
inputs:
parameters:
- name: object_name
value: Adcetris_master_288.pptx
- name: s3_folder_path
value: /mpsr/decks
name: libreoffice-kck7k
outboundNodes:
- libreoffice-kck7k-55020225
phase: Succeeded
progress: 3/3
resourcesDuration:
cpu: 1077
memory: 708
startedAt: "2022-07-19T10:48:45Z"
templateName: libreoffice
templateScope: local/libreoffice-kck7k
type: DAG
libreoffice-kck7k-55020225:
boundaryID: libreoffice-kck7k
displayName: convert-to-pdf
finishedAt: "2022-07-19T10:58:35Z"
hostNodeName: ip-10-120-112-29.ec2.internal
id: libreoffice-kck7k-55020225
inputs:
parameters:
- name: create-pvc-name
value: libreoffice-2zg86-my-pvc
name: libreoffice-kck7k.convert-to-pdf
outputs:
artifacts:
- name: main-logs
s3:
key: artifacts/libreoffice-kck7k/2022/07/19/libreoffice-kck7k-55020225/main.log
exitCode: "0"
phase: Succeeded
progress: 1/1
resourcesDuration:
cpu: 1066
memory: 702
startedAt: "2022-07-19T10:49:24Z"
templateName: convert-to-pdf
templateScope: local/libreoffice-kck7k
type: Pod
libreoffice-kck7k-2249554463:
boundaryID: libreoffice-kck7k
children:
- libreoffice-kck7k-3459936430
- libreoffice-kck7k-55020225
displayName: create-pvc
finishedAt: "2022-07-19T10:48:46Z"
hostNodeName: ip-10-120-112-29.ec2.internal
id: libreoffice-kck7k-2249554463
name: libreoffice-kck7k.create-pvc
outputs:
exitCode: "0"
parameters:
- name: create-pvc-manifest
value: '{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"creationTimestamp":"2022-07-19T09:51:46Z","finalizers":["kubernetes.io/pvc-protection"],"managedFields":[{"apiVersion":"v1","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:accessModes":{},"f:resources":{"f:requests":{".":{},"f:storage":{}}},"f:volumeMode":{}}},"manager":"kubectl-create","operation":"Update","time":"2022-07-19T09:51:46Z"}],"name":"libreoffice-2zg86-my-pvc","namespace":"cd-msr","resourceVersion":"197916499","uid":"7b579172-d65c-4294-b6a8-77d2f2acaa0f"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":"gp2","volumeMode":"Filesystem"},"status":{"phase":"Pending"}}'
valueFrom:
jsonPath: '{}'
- name: create-pvc-name
value: libreoffice-2zg86-my-pvc
valueFrom:
jsonPath: '{.metadata.name}'
- name: create-pvc-size
value: ""
valueFrom:
jsonPath: '{.status.capacity.storage}'
phase: Succeeded
progress: 1/1
resourcesDuration:
cpu: 0
memory: 0
startedAt: "2022-07-19T10:48:45Z"
templateName: create-pvc
templateScope: local/libreoffice-kck7k
type: Pod
libreoffice-kck7k-3459936430:
boundaryID: libreoffice-kck7k
children:
- libreoffice-kck7k-55020225
displayName: download-file
finishedAt: "2022-07-19T10:49:19Z"
hostNodeName: ip-10-120-112-29.ec2.internal
id: libreoffice-kck7k-3459936430
inputs:
parameters:
- name: create-pvc-name
value: libreoffice-2zg86-my-pvc
- name: object_name
value: Adcetris_master_288.pptx
- name: s3_folder_path
value: /mpsr/decks
name: libreoffice-kck7k.download-file
outputs:
artifacts:
- name: main-logs
s3:
key: artifacts/libreoffice-kck7k/2022/07/19/libreoffice-kck7k-3459936430/main.log
exitCode: "0"
phase: Succeeded
progress: 1/1
resourcesDuration:
cpu: 11
memory: 6
startedAt: "2022-07-19T10:48:55Z"
templateName: download-file
templateScope: local/libreoffice-kck7k
type: Pod
phase: Succeeded
progress: 3/3
resourcesDuration:
cpu: 1077
memory: 708
startedAt: "2022-07-19T10:48:45Z"
The above workflow is simply using LibreOffice Headless to convert pptx into pdf. I saw the logs for LibreOffice as well. It has frequent SIGINT, SIGTERM system calls. I am thinking that the emissary executor sends different or new interrupts to processes that apps (in our case LibreOffice) cannot handle. Does the emissary executor kill some kind of thread on its own as well?
Thank you. Have you tried :latest
?
:latest with libreOffice.? yeah https://hub.docker.com/r/linuxserver/libreoffice#! this was the official one I tried to run on. Same issue there as well.
No, argoproj/argoexec:latest
.
Let me just try that and I will get back to you after the execution. It would help to speed up the process if you could tell me where to add it exactly.
I changed gcr.io/ml-pipeline/argoexec:v3.1.6-patch-license-compliance to 'argoproj/argoexec:latest' and executor to emissary now the pipeline is throwing Error (exit code 2): unexpected end of JSON input. Screenshot attached.
libreoffice_fail_stack (1).txt log trace of libreoffice.
You need to run latest controller too.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
We upgraded the platform to kubeflow 1.5. It is working there. thnx