nebari icon indicating copy to clipboard operation
nebari copied to clipboard

[BUG] - conda-store builds taking forever.

Open satra opened this issue 6 months ago â€Ē 16 comments

Describe the bug

i'm specifying a conda store build in our deployed instance and it appears to take for ever (this particular one is still running for 7 hours) without any logs. when looking at the worker pod there are a bunch of messages:

worker pod messages

│ conda-store-worker celery.exceptions.ChordError: Dependency build-1-constructor-installer raised Calle │ │ dProcessError(1, ['python', '-m', 'constructor', '--help']) │ │ conda-store-worker [2025-06-27 01:14:31,811: INFO/ForkPoolWorker-4] Task task_build_conda_pack[build-1 │ │ conda-pack[] succeeded in 72.24953944900017s: None │ │ conda-store-worker [2025-06-28 00:33:58,738: INFO/MainProcess] Task task_update_storage_metrics[469226 │ │ c-8a0d-4e49-b73c-5c1d1dee2b92[] received │ │ conda-store-worker [2025-06-28 00:33:58,765: INFO/ForkPoolWorker-2] Task task_update_storage_metrics[4 │ │ 9226ac-8a0d-4e49-b73c-5c1d1dee2b92[] succeeded in 0.02669937099562958s: None │ │ conda-store-worker [2025-06-28 00:33:58,765: INFO/MainProcess] Task task_build_conda_environment[build │ │ 3-environment[] received │ │ conda-store-worker [2025-06-28 00:34:00,745: WARNING/ForkPoolWorker-2] CONDA_FLAGS=--strict-channel-pr │ │ iority │ │ conda-store-worker [2025-06-28 00:34:00,750: WARNING/ForkPoolWorker-2] Locking dependencies for ['linu │ │ x-64']... │ │ conda-store-worker [2025-06-28 00:34:00,751: INFO/ForkPoolWorker-2] linux-64 using specs ['dandi', 'da │ │ talad', 'ipykernel'] │ │ conda-store-worker [2025-06-28 00:34:16,598: WARNING/ForkPoolWorker-2] - Install lock using: │ │ conda-store-worker [2025-06-28 00:34:16,598: WARNING/ForkPoolWorker-2] │ │ conda-store-worker [2025-06-28 00:34:16,598: WARNING/ForkPoolWorker-2] conda-lock install --name YOURE │ │ NV /tmp/tmpr8mbu496/conda-lock.yaml │ │ conda-store-worker [2025-06-28 00:34:16,598: WARNING/ForkPoolWorker-2] Rendering lockfile(s) for linux │ │ -64... │ │ conda-store-worker [2025-06-28 00:34:16,600: WARNING/ForkPoolWorker-2] - Install lock using : │ │ conda-store-worker [2025-06-28 00:34:16,600: WARNING/ForkPoolWorker-2] │ │ conda-store-worker [2025-06-28 00:34:16,600: WARNING/ForkPoolWorker-2] conda create --name YOURENV --f │ │ ile conda-linux-64.lock │ │ conda-store-worker /opt/conda/lib/python3.12/site-packages/conda/base/context.py:198: FutureWarning: A │ │ dding 'defaults' to channel list implicitly is deprecated and will be removed in 25.3. │ │ conda-store-worker │ │ conda-store-worker To remove this warning, please choose a default channel explicitly with conda's reg │ │ ular configuration system, e.g. by adding 'defaults' to the list of channels: │ │ conda-store-worker │ │ conda-store-worker conda config --add channels defaults │ │ conda-store-worker │ │ conda-store-worker For more information see https://docs.conda.io/projects/conda/en/stable/user-guide/ │ │ configuration/use-condarc.html │ │ conda-store-worker │ │ conda-store-worker deprecated.topic( │ │ conda-store-worker [2025-06-28 00:36:23,765: INFO/ForkPoolWorker-2] building conda_prefix=/home/conda/ │ │ yarikoptic/124bb6d6-1751070838-3-test-1 took 144.961 [s] | conda-store-worker [2025-06-28 00:36:29,140: INFO/MainProcess] Task task_build_conda_env_export[build- │ │ -conda-env-export[] received │ │ conda-store-worker [2025-06-28 00:36:29,141: INFO/MainProcess] Task task_build_conda_pack[build-3-cond │ │ -pack[] received │ │ conda-store-worker [2025-06-28 00:36:29,142: INFO/MainProcess] Task task_build_constructor_installer[b │ │ ild-3-constructor-installer[] received │ │ conda-store-worker [2025-06-28 00:36:29,143: INFO/ForkPoolWorker-2] Task task_build_conda_environment[ │ │ uild-3-environment[] succeeded in 150.37720980399172s: None │ │ conda-store-worker [2025-06-28 00:36:32,191: INFO/ForkPoolWorker-4] Task task_build_conda_env_export[b │ │ ild-3-conda-env-export[] succeeded in 3.0499828989995876s: None │ │ conda-store-worker [2025-06-28 00:36:52,170: INFO/ForkPoolWorker-3] packaging archive of conda environ │ │ ment=/home/conda/yarikoptic/124bb6d6-1751070838-3-test-1 took 23.025 [s] │ │ conda-store-worker [2025-06-28 00:36:52,171: ERROR/ForkPoolWorker-3] Chord '6a78a37e-f335-46b6-8a40-04 │ │ d54d29f0bd' raised: ChordError("Dependency build-3-constructor-installer raised CalledProcessError(1, │ │ ['python', '-m', 'constructor', '--help'])") │ │ conda-store-worker Traceback (most recent call last): │ │ conda-store-worker File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/celery/back │ │ ends/redis.py", line 528, in on_chord_part_return │ │ conda-store-worker resl = [unpack(tup, decode) for tup in resl] │ │ conda-store-worker ^^^^^^^^^^^^^^^^^^^ │ │ conda-store-worker File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/celery/back │ │ ends/redis.py", line 434, in _unpack_chord_result │ │ conda-store-worker raise ChordError(f'Dependency {tid} raised {retval!r}') │ │ conda-store-worker celery.exceptions.ChordError: Dependency build-3-constructor-installer raised Calle │ │ dProcessError(1, ['python', '-m', 'constructor', '--help']) │ │ conda-store-worker [2025-06-28 00:36:52,173: INFO/ForkPoolWorker-3] Task task_build_conda_pack[build-3 │ │ conda-pack[] succeeded in 23.03208869199443s: None │ │ conda-store-worker [2025-06-28 23:32:51,063: INFO/MainProcess] Task task_update_storage_metrics[1c984d │ │ 5-15d7-4ba2-970a-07ce711083a0[] received │ │ conda-store-worker [2025-06-28 23:32:51,071: INFO/ForkPoolWorker-2] Task task_update_storage_metrics[1 │ │ 984d85-15d7-4ba2-970a-07ce711083a0[] succeeded in 0.007957488007377833s: None │ │ conda-store-worker [2025-06-28 23:32:51,071: INFO/MainProcess] Task task_build_conda_environment[build │ │ 4-environment[] received │ │ conda-store-worker [2025-06-28 23:32:53,091: WARNING/ForkPoolWorker-2] CONDA_FLAGS=--strict-channel-pr │ │ iority │ │ conda-store-worker [2025-06-28 23:32:56,799: WARNING/ForkPoolWorker-2] Locking dependencies for ['linu │ │ x-64']... │ │ conda-store-worker [2025-06-28 23:32:56,800: INFO/ForkPoolWorker-2] linux-64 using specs ['python >=3. │ │ 13', 'ipykernel', 'ipywidgets', 'pip *']

Expected behavior

a build completes, provides build log output and status.

OS and architecture in which you are running Nebari

macos arm

How to Reproduce the problem?

added this through the conda-store ui.

channels:
  - conda-forge
  - defaults
dependencies:
  - python>=3.13
  - ipykernel
  - ipywidgets
  - pip
  - pip:
    - dandi

Command output


Versions and dependencies used.

conda 23.11.0

âŊ kubectl version Client Version: v1.32.2 Kustomize Version: v5.5.0 Server Version: v1.31.9-eks-5d4a308

Compute environment

AWS

Integrations

conda-store

Anything else?

in general, we would like to have a few builds available to all users. being able to monitor the builds (and their failures) would be useful.

satra avatar Jun 29 '25 07:06 satra

I would check the conda-store logs - https://www.nebari.dev/docs/how-tos/access-logs-loki#conda-store-logs

And also confirm that you haven't run out of conda storage. If you have, you can delete some environments or, through the admin ui you can delete individual builds.

The conda-store admin UI is at /conda-store/admin. You may need to click login.

kcpevey avatar Jul 02 '25 13:07 kcpevey

thank you @kcpevey

only 3% of storage is being used.

regarding the logs, i see the build being received by the worker in the worker pod logs, but that's it.

on the admin ui only the following is show in the logs:

ui log
starting build of conda environment 2025-07-02 14:09:11.885524 UTC
plugin-conda-lock: lock_environment entrypoint for conda-lock
plugin-conda-lock: Note that the output of `conda config --show` displayed below only reflects settings in the conda configuration file, which might be overridden by variables required to be set by conda-store via the environment. Overridden settings: CONDA_FLAGS=--strict-channel-priority
plugin-conda-lock: Running command: ['mamba', 'info']
plugin-conda-lock: /opt/conda/lib/python3.12/site-packages/conda/base/context.py:198: FutureWarning: Adding 'defaults' to channel list implicitly is deprecated and will be removed in 25.3. 
plugin-conda-lock: To remove this warning, please choose a default channel explicitly with conda's regular configuration system, e.g. by adding 'defaults' to the list of channels:
plugin-conda-lock:   conda config --add channels defaults
plugin-conda-lock: For more information see https://docs.conda.io/projects/conda/en/stable/user-guide/configuration/use-condarc.html
plugin-conda-lock:   deprecated.topic(
plugin-conda-lock:           mamba version : 1.5.9
plugin-conda-lock:      active environment : None
plugin-conda-lock:        user config file : /root/.condarc
plugin-conda-lock:  populated config files : /opt/conda/.condarc
plugin-conda-lock:           conda version : 24.9.2
plugin-conda-lock:     conda-build version : not installed
plugin-conda-lock:          python version : 3.12.7.final.0
plugin-conda-lock:                  solver : libmamba (default)
plugin-conda-lock:        virtual packages : __archspec=1=zen2
plugin-conda-lock:                           __conda=24.9.2=0
plugin-conda-lock:                           __glibc=2.31=0
plugin-conda-lock:                           __linux=5.10.237=0
plugin-conda-lock:                           __unix=0=0
plugin-conda-lock:        base environment : /opt/conda  (writable)
plugin-conda-lock:       conda av data dir : /opt/conda/etc/conda
plugin-conda-lock:   conda av metadata url : None
plugin-conda-lock:            channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
plugin-conda-lock:                           https://repo.anaconda.com/pkgs/main/noarch
plugin-conda-lock:                           https://repo.anaconda.com/pkgs/r/linux-64
plugin-conda-lock:                           https://repo.anaconda.com/pkgs/r/noarch
plugin-conda-lock:           package cache : /opt/conda/pkgs
plugin-conda-lock:                           /root/.conda/pkgs
plugin-conda-lock:        envs directories : /opt/conda/envs
plugin-conda-lock:                           /root/.conda/envs
plugin-conda-lock:                platform : linux-64
plugin-conda-lock:              user-agent : conda/24.9.2 requests/2.32.3 CPython/3.12.7 Linux/5.10.237-230.949.amzn2.x86_64 ubuntu/20.04.6 glibc/2.31 solver/libmamba conda-libmamba-solver/24.9.0 libmambapy/1.5.9
plugin-conda-lock:                 UID:GID : 0:0
plugin-conda-lock:              netrc file : None
plugin-conda-lock:            offline mode : False
plugin-conda-lock: Running command: ['conda', 'config', '--show']
plugin-conda-lock: /opt/conda/lib/python3.12/site-packages/conda/base/context.py:198: FutureWarning: Adding 'defaults' to channel list implicitly is deprecated and will be removed in 25.3. 
plugin-conda-lock: To remove this warning, please choose a default channel explicitly with conda's regular configuration system, e.g. by adding 'defaults' to the list of channels:
plugin-conda-lock:   conda config --add channels defaults
plugin-conda-lock: For more information see https://docs.conda.io/projects/conda/en/stable/user-guide/configuration/use-condarc.html
plugin-conda-lock:   deprecated.topic(
plugin-conda-lock: add_anaconda_token: True
plugin-conda-lock: add_pip_as_python_dependency: True
plugin-conda-lock: aggressive_update_packages:
plugin-conda-lock:   - ca-certificates
plugin-conda-lock:   - certifi
plugin-conda-lock:   - openssl
plugin-conda-lock: allow_conda_downgrades: False
plugin-conda-lock: allow_cycles: True
plugin-conda-lock: allow_non_channel_urls: False
plugin-conda-lock: allow_softlinks: False
plugin-conda-lock: allowlist_channels: []
plugin-conda-lock: always_copy: False
plugin-conda-lock: always_softlink: False
plugin-conda-lock: always_yes: None
plugin-conda-lock: anaconda_upload: None
plugin-conda-lock: auto_activate_base: True
plugin-conda-lock: auto_stack: 0
plugin-conda-lock: auto_update_conda: True
plugin-conda-lock: bld_path: 
plugin-conda-lock: changeps1: True
plugin-conda-lock: channel_alias: https://conda.anaconda.org
plugin-conda-lock: channel_priority: flexible
plugin-conda-lock: channel_settings: []
plugin-conda-lock: channels:
plugin-conda-lock:   - defaults
plugin-conda-lock: client_ssl_cert: None
plugin-conda-lock: client_ssl_cert_key: None
plugin-conda-lock: clobber: False
plugin-conda-lock: conda_build: {}
plugin-conda-lock: create_default_packages: []
plugin-conda-lock: croot: /opt/conda/conda-bld
plugin-conda-lock: custom_channels:
plugin-conda-lock:   pkgs/main: https://repo.anaconda.com
plugin-conda-lock:   pkgs/r: https://repo.anaconda.com
plugin-conda-lock:   pkgs/pro: https://repo.anaconda.com
plugin-conda-lock: custom_multichannels:
plugin-conda-lock:   defaults: 
plugin-conda-lock:     - https://repo.anaconda.com/pkgs/main
plugin-conda-lock:     - https://repo.anaconda.com/pkgs/r
plugin-conda-lock:   local: 
plugin-conda-lock: debug: False
plugin-conda-lock: default_channels:
plugin-conda-lock:   - https://repo.anaconda.com/pkgs/main
plugin-conda-lock:   - https://repo.anaconda.com/pkgs/r
plugin-conda-lock: default_python: 3.12
plugin-conda-lock: default_threads: None
plugin-conda-lock: denylist_channels: []
plugin-conda-lock: deps_modifier: not_set
plugin-conda-lock: dev: False
plugin-conda-lock: disallowed_packages: []
plugin-conda-lock: download_only: False
plugin-conda-lock: dry_run: False
plugin-conda-lock: enable_private_envs: False
plugin-conda-lock: env_prompt: ({default_env}) 
plugin-conda-lock: envs_dirs:
plugin-conda-lock:   - /opt/conda/envs
plugin-conda-lock:   - /root/.conda/envs
plugin-conda-lock: envvars_force_uppercase: True
plugin-conda-lock: error_upload_url: https://conda.io/conda-post/unexpected-error
plugin-conda-lock: execute_threads: 1
plugin-conda-lock: experimental: []
plugin-conda-lock: extra_safety_checks: False
plugin-conda-lock: fetch_threads: 5
plugin-conda-lock: force: False
plugin-conda-lock: force_32bit: False
plugin-conda-lock: force_reinstall: False
plugin-conda-lock: force_remove: False
plugin-conda-lock: ignore_pinned: False
plugin-conda-lock: json: False
plugin-conda-lock: local_repodata_ttl: 1
plugin-conda-lock: migrated_channel_aliases: []
plugin-conda-lock: migrated_custom_channels: {}
plugin-conda-lock: no_lock: False
plugin-conda-lock: no_plugins: False
plugin-conda-lock: non_admin_enabled: True
plugin-conda-lock: notify_outdated_conda: True
plugin-conda-lock: number_channel_notices: 5
plugin-conda-lock: offline: False
plugin-conda-lock: override_channels_enabled: True
plugin-conda-lock: path_conflict: clobber
plugin-conda-lock: pinned_packages: []
plugin-conda-lock: pip_interop_enabled: False
plugin-conda-lock: pkgs_dirs:
plugin-conda-lock:   - /opt/conda/pkgs
plugin-conda-lock:   - /root/.conda/pkgs
plugin-conda-lock: proxy_servers: {}
plugin-conda-lock: quiet: False
plugin-conda-lock: register_envs: True
plugin-conda-lock: remote_backoff_factor: 1
plugin-conda-lock: remote_connect_timeout_secs: 9.15
plugin-conda-lock: remote_max_retries: 3
plugin-conda-lock: remote_read_timeout_secs: 60.0
plugin-conda-lock: repodata_fns:
plugin-conda-lock:   - current_repodata.json
plugin-conda-lock:   - repodata.json
plugin-conda-lock: repodata_threads: None
plugin-conda-lock: repodata_use_zst: True
plugin-conda-lock: report_errors: None
plugin-conda-lock: reporters:
plugin-conda-lock:   - {'backend': 'console', 'output': 'stdout', 'verbosity': 0, 'quiet': False}
plugin-conda-lock: restore_free_channel: False
plugin-conda-lock: rollback_enabled: True
plugin-conda-lock: root_prefix: /opt/conda
plugin-conda-lock: safety_checks: warn
plugin-conda-lock: sat_solver: pycosat
plugin-conda-lock: separate_format_cache: False
plugin-conda-lock: shortcuts: True
plugin-conda-lock: shortcuts_only: []
plugin-conda-lock: show_channel_urls: None
plugin-conda-lock: signing_metadata_url_base: None
plugin-conda-lock: solver: libmamba
plugin-conda-lock: solver_ignore_timestamps: False
plugin-conda-lock: ssl_verify: True
plugin-conda-lock: subdir: linux-64
plugin-conda-lock: subdirs:
plugin-conda-lock:   - linux-64
plugin-conda-lock:   - noarch
plugin-conda-lock: target_prefix_override: 
plugin-conda-lock: trace: False
plugin-conda-lock: track_features: []
plugin-conda-lock: unsatisfiable_hints: True
plugin-conda-lock: unsatisfiable_hints_check_depth: 2
plugin-conda-lock: update_modifier: update_specs
plugin-conda-lock: use_index_cache: False
plugin-conda-lock: use_local: False
plugin-conda-lock: use_only_tar_bz2: None
plugin-conda-lock: verbosity: 0
plugin-conda-lock: verify_threads: 1
plugin-conda-lock: Running command: ['conda', 'config', '--show-sources']
plugin-conda-lock: ==> /opt/conda/.condarc <==
plugin-conda-lock: channels: []

it's been building for 20+ mins at this point with no output shown anywhere to track progress. on k9s, the store worker only shows that it has received the build.

 conda-store-worker [2025-07-02 14:09:11,863: INFO/MainProcess] Task task_build_conda_environment[build │
│ 4-environment[] received                                                                               │
│ conda-store-worker [2025-07-02 14:09:14,041: WARNING/ForkPoolWorker-3] CONDA_FLAGS=--strict-channel-pr │
│ iority                                                                                                 │
│ conda-store-worker [2025-07-02 14:09:17,158: WARNING/ForkPoolWorker-3] Locking dependencies for ['linu │
│ x-64']...                                                                                              │
│ conda-store-worker [2025-07-02 14:09:17,159: INFO/ForkPoolWorker-3] linux-64 using specs ['python 3.12 │
│ .*', 'ipykernel', 'ipywidgets', 'pip *']                                                               │
│ conda-store-worker [2025-07-02 14:09:39,240: INFO/MainProcess] Terminating build-3-environment (15)    │
│ conda-store-worker [2025-07-02 14:09:39,256: INFO/MainProcess] Task task_cleanup_builds[a30c3028-9b2f- │
│ 024-994f-bdfc47b3b763[] received                                                                       │
│ conda-store-worker [2025-07-02 14:09:45,336: WARNING/ForkPoolWorker-6] marking build 3 as CANCELED sin │
│ ce stuck in BUILDING state and not present on workers                                                  │
│ conda-store-worker [2025-07-02 14:09:45,359: INFO/ForkPoolWorker-6] Task task_cleanup_builds[a30c3028- │
│ b2f-4024-994f-bdfc47b3b763[] succeeded in 1.1121771300095133s: None 

satra avatar Jul 02 '25 14:07 satra

i was able to build a different environment. i suspect this may be a conda resolution issue that is taking a long time.

satra avatar Jul 02 '25 15:07 satra

There is a high chance that this was the case; the logs are divided into three parts, but they need to be successfully processed by the worker before appearing in the UI (they are stored in Minio as artifacts). One quick check I sometimes do when those things happen is building the environment locally, since our main computer usually has more resources available. If it takes time to run on my machine, then it will take twice as long on conda-store.

Although 7 hours is too much, I suggest restarting the conda-store worker pod. And retrying with minimal changes to the env. Also, building it in parts might be helpful in case one of the dependencies is adding more complexity to the solving (I tend to reduce it in half, then adding back the remaining deps and resolving)

Additionally, regarding conda-forge, it is generally not a good idea to mix defaults and CF, as numerous internal naming resolutions may end up falling into the default conda channel, leading to broken environments and intractable issues. I recommend sticking to conda-forge only if possible.

viniciusdc avatar Jul 03 '25 02:07 viniciusdc

thanks @viniciusdc - i did end up restricting to conda forge only, but still no dice. and since the other envs built (e.g. pytorch), i knew worker pods were fine. is there a corresponding docker image in which i could test the setup locally in a shell so i can see the lock file creation process output.

next i was going to try exporting the environment from a local linux install and feeding that to the builder.

satra avatar Jul 03 '25 03:07 satra

next i was going to try exporting the environment from a local linux install and feeding that to the builder.

I know this was closed out, but I would try this first since there might be an issue with the dependency resolving stage itself. Finding that might help in understanding why it does not show up in the logs. You can run conda-store locally as well through here https://conda.store/conda-store/how-tos/install-standalone

but, under the hood conda-store runs, conda/mamba so you also should be able to run the env build localy trough that too

viniciusdc avatar Jul 04 '25 12:07 viniciusdc

I am checking this right now, based on my findings so far, the worker node is throutting while performing the build, ususaly kuberntes would give it more resources since there is not limiting factor on the container config, the issue is, since its running on general there simplely might no be any more resources available.

Image

You opened a new issue regarding the node instance type changes https://github.com/nebari-dev/nebari/issues/3093, I assume you increased the available resources. After returning your deployment (after 5 minutes, Keycloak should have been running again), did you try this build again?

viniciusdc avatar Jul 04 '25 15:07 viniciusdc

I will re-open this issue, because this evidenciates a good example of why we need auto-scaling with the conda-store workers

viniciusdc avatar Jul 04 '25 15:07 viniciusdc

it wasn't for increasing size of resources but for another issue with ELB failing a healthcheck.

if i can specify resources for the conda-store workers (as in the draft PR you shared), it should trigger a scale up of one of the other environments if there is not enough on general. i can also up the max nodes to 2 for general, so that k8s could spread out pods if needed.

satra avatar Jul 04 '25 15:07 satra

i can also up the max nodes to 2 for general

I don't recommend this one. AWS has an issue with multi-region zones and PV mounting. If your other node ends up in a different zone than the first one, you will start experiencing issues with pods pending due to errors while mounting the volumes.

As a quick "fix" for this env, I recommend building locally and exporting the conda-lock file to pass directly to conda-store, which should tell conda to skip the solving process completely, which should not increase the CPU usage

for the general issue with conda-store, that PR will help redirecting the load to other nodes that have more resources available. In the mean time, I will re-raise the need for an auto-scaling option to conda-store (we considered Keda in the past for that)

viniciusdc avatar Jul 04 '25 16:07 viniciusdc

i was able to build by supplying the conda-lock file but had to do a few retries through the admin interface.

it seems that if it runs into this connection error it does not retry.

Click to open error log about connection error
action_fetch_and_extract_conda_packages: DOWNLOAD python-3.11.0-he550d4f_1_cpython.conda | 53 of 146
Traceback (most recent call last):
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/connectionpool.py", line 716, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/connectionpool.py", line 468, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/connectionpool.py", line 463, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/http/client.py", line 1428, in getresponse
    response.begin()
  File "/opt/conda/envs/conda-store-server/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/http/client.py", line 300, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/connectionpool.py", line 802, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/util/retry.py", line 552, in increment
    raise six.reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/connectionpool.py", line 716, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/connectionpool.py", line 468, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/urllib3/connectionpool.py", line 463, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/http/client.py", line 1428, in getresponse
    response.begin()
  File "/opt/conda/envs/conda-store-server/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/http/client.py", line 300, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/conda_store_server/_internal/worker/build.py", line 256, in build_conda_environment
    context = action.action_fetch_and_extract_conda_packages(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/conda_store_server/_internal/action/base.py", line 38, in wrapper
    action_context.result = f(action_context, *args, **kwargs)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/conda_store_server/_internal/action/download_packages.py", line 88, in action_fetch_and_extract_conda_packages
    ) = conda_package_streaming.url.conda_reader_for_url(url)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/conda_package_streaming/url.py", line 75, in conda_reader_for_url
    conda = LazyConda(url, session)
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/conda_package_streaming/lazy_wheel.py", line 50, in __init__
    tail = self._stream_response(start="", end=CONTENT_CHUNK_SIZE)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/conda_package_streaming/lazy_wheel.py", line 190, in _stream_response
    response = self._session.get(self._url, headers=headers, stream=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/conda-store-server/lib/python3.12/site-packages/requests/adapters.py", line 682, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

satra avatar Jul 05 '25 15:07 satra

uhm.. that's indeed intrgiuing, may I suggest you to check both the resource consumption of the workers trough grafana, and install flower on another pod connected to the same DB as the conda-store worker?

apiVersion: apps/v1
kind: Deployment
metadata:
  name: celery-dashboard
spec:
  replicas: 1
  selector:
    matchLabels:
      app: celery-dashboard
  template:
    metadata:
      labels:
        app: celery-dashboard
    spec:
      containers:
        - name: celery-dashboard
          image: "mher/flower:latest"
          command: ["celery",  "flower"]
          env:
            - name: FLOWER_BROKER_API
              value: "redis://:******@nebari-conda-store-redis:6379/0"
            - name: SERVER_PORT
              value: "5555"
          ports:
            - containerPort: 5555
              name: flower
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "1"
              memory: "2Gi"

you will need to find the redis URL conda-store is using, should be inside the conda-store-secret, this shoudl allow you to see available workers, and overall status of the executed tasks

viniciusdc avatar Jul 09 '25 20:07 viniciusdc

I am writting a guide on that and will live in the docs soon

viniciusdc avatar Jul 09 '25 20:07 viniciusdc

Hey @satra I am not sure if we checked this, but can you verify the disk usage inside the conda-store worker pod?

df -h .

viniciusdc avatar Jul 22 '25 15:07 viniciusdc

pinging @asmacdo - who has been checking this.

satra avatar Jul 29 '25 13:07 satra

FWIW this can still happen. (And in our case, "building forever" eventually kills the node.) On our side, we haven't seen this for a long time, avoided by documentation-- users are requested to build conda environments in their own userspace. When we build shared environments, it can be done by pre-generating the conda lock file.

My previous (mis)understanding was that creating an environment from spec (not lockfile) ran forever when we installed dandi via pip. However, I saw this again today when attempting to install this (NOTE: dependency resolution made that impossible, but still should have failed instead of exploding)

asmacdo avatar Nov 18 '25 21:11 asmacdo