argo-events icon indicating copy to clipboard operation
argo-events copied to clipboard

SENSOR_OBJECT hits environment variable max length on large sensors

Open daniel-anova opened this issue 2 years ago • 9 comments

Describe the bug When creating a big Sensor manifest the SENSOR_OBJECT will be too big and the sensor container will fail to start with exec user process caused: argument list too long.

To Reproduce Steps to reproduce the behavior:

  1. Create a big sensor manifest, our case it was 152Kb
  2. Attempt to deploy it
  3. The sensor container will crash loop with exec user process caused: argument list too long error

Expected behavior Sensor container would be created and run as normal

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Kubernetes: AKS 1.23.8
  • Argo: 3.3.9
  • Argo Events: 1.7.1

Additional context Kubernetes cluster is using istio service mesh 1.14.1.

The Environment Variable memory limit seems to be dependent on linux kernel compilation and thus hard to change.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

daniel-anova avatar Aug 24 '22 15:08 daniel-anova

@daniel-anova - thanks for reporting this issue! Can you let me know what is the reason your sensor spec gets so large?

whynowy avatar Aug 25 '22 20:08 whynowy

@daniel-anova can u share what OS version/bash version/AKS instance type your k8s workers are? Also output from:

getconf -a | grep ARG_MAX
true | xargs --show-limits

and https://stackoverflow.com/a/49119526

tooptoop4 avatar Aug 25 '22 21:08 tooptoop4

@daniel-anova - thanks for reporting this issue! Can you let me know what is the reason your sensor spec gets so large?

We're using argo to schedule data processing for which only the inputs change, we've made a frontend that allows users to add new run schedules and it was all being added to a single sensor manifest (since then we've refactored to create multiple sensor files).

@daniel-anova can u share what OS version/bash version/AKS instance type your k8s workers are? Also output from:

For the nodes:

  • AKS/Kubernetes version: 1.23.8
  • Kernel 5.4.0-1086-azure
  • OS Image: Ubuntu 18.04.6 LTS
  • Container Runtime: containerd://1.5.11+azure-2
  • Bash Version: Your sensor container doesn't have bash

For the other commands I used an ubuntu:latest container:

<<K9s-Shell>> Pod: default/ubuntu-r8bfj | Container: ubuntu
root@ubuntu-r8bfj:/# getconf -a | grep ARG_MAX
true | xargs --show-limits
ARG_MAX                            2097152
_POSIX_ARG_MAX                     2097152
Your environment variables take up 2198 bytes
POSIX upper limit on argument length (this system): 2092906
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2090708
Size of command buffer we are actually using: 131072
Maximum parallelism (--max-procs must be no greater): 2147483647

root@ubuntu-r8bfj:/#
root@ubuntu-r8bfj:/# ./delme.sh
Time 12:35:36, chars 3
Time 12:35:36, chars 5
Time 12:35:36, chars 9
Time 12:35:36, chars 17
Time 12:35:36, chars 33
Time 12:35:36, chars 65
Time 12:35:36, chars 129
Time 12:35:36, chars 257
Time 12:35:36, chars 513
Time 12:35:36, chars 1025
Time 12:35:36, chars 2049
Time 12:35:36, chars 4097
Time 12:35:36, chars 8193
Time 12:35:36, chars 16385
Time 12:35:36, chars 32769
Time 12:35:36, chars 65537
Time 12:35:36, chars 131073
Time 12:35:36, chars 262145
Time 12:35:36, chars 524289
Time 12:35:36, chars 1048577
Time 12:35:36, chars 2097153
Time 12:35:36, chars 4194305
Time 12:35:36, chars 8388609
Time 12:35:36, chars 16777217
Time 12:35:36, chars 33554433
Time 12:35:37, chars 67108865
Time 12:35:39, chars 134217729
Time 12:35:42, chars 268435457
Time 12:35:49, chars 536870913
./delme.sh: xrealloc: cannot allocate 18446744071562068096 bytes
root@ubuntu-r8bfj:/#

daniel-anova avatar Aug 26 '22 12:08 daniel-anova

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 26 '22 03:10 github-actions[bot]

crispy

tooptoop4 avatar Oct 26 '22 06:10 tooptoop4

comment to keep this issue alive.

daniel-anova avatar Oct 26 '22 08:10 daniel-anova

We may also be hitting this issue. Would like to see resolution or fix

sstrand72 avatar Mar 22 '23 22:03 sstrand72

Hi @whynowy I have two proposals to solve this issue. Let me know if you have other proposals. Both would remove the SENSOR_OBJECT from the env variables:

  1. Sensor will get the crd object from k8s. This solution creates a new relationship between the sensor pod and sensor CRD.
  2. Controller will write the sensor crd to a file in a configmap and mount the volume to the sensor. This solution creates a new resource that must be managed.

Would be good to get your thoughts + community’s thoughts on the above potential solutions to the problem.

AalokAhluwalia avatar May 04 '23 18:05 AalokAhluwalia

Hi @whynowy I have two proposals to solve this issue. Let me know if you have other proposals. Both would remove the SENSOR_OBJECT from the env variables:

  1. Sensor will get the crd object from k8s. This solution creates a new relationship between the sensor pod and sensor CRD.
  • This will never be an option.
  1. Controller will write the sensor crd to a file in a configmap and mount the volume to the sensor. This solution creates a new resource that must be managed.

This could be a possible solution. My few points:

  1. Make it optional to use ConfigMap;
  2. Prevent the ConfigMap from being modified by external;
  3. When the Sensor object is updated, the deployment should be updated to reload.

Would be good to get your thoughts + community’s thoughts on the above potential solutions to the problem.

whynowy avatar May 08 '23 23:05 whynowy