pants icon indicating copy to clipboard operation
pants copied to clipboard

"big" docker inputs slows pants to a crawl; hangs/freezes(?) UI

Open cburroughs opened this issue 10 months ago • 1 comments

Describe the bug

I'm a little hand-wavy here on the slow/hang/freeze distinction. Internally I have a case that exhausts the patience of humans and hits a one hour CI timeout. I've tried to extract that and have something that for reasons that are unclear to me is a little less consistently bad but I think demonstrates a problem. Or rather a pair of real problems: something slow, and said slow thing hanging the spinner/UI for a long time.

Demonstration repo at https://github.com/cburroughs/example-docker/commits/csb/freeze-example/

  • copy the wheels to /tmp/psutil

The basic setup is a chain of dependent docker files with some Pexs. I'm using https://pypi.org/project/acryl-datahub/ to represent a "big pex" with lots of dependencies and https://pypi.org/project/cowsay/ to represent a more reasonable one.

cowsay

pants --no-pantsd package src/freeze/cowsay::

This completes in about 30 seconds on my workstation and seems fine.

bigpex

pants --no-pantsd package src/freeze/bigpex::

Completes in > 2 minutes. It seems like it is doing a lot of work but nothing out of the ordinary.

loosebigpex

pants --no-pantsd package src/freeze/loosebigpex:: (Like the above, but with layout='loose')

Is all over the place. Sometimes the "dots" will hang for > 30 seconds:

image

Sometimes for minutes. Sometimes it seems to be indefinite. It does not seem deadlocked as the pants process is chugging away at CPU. When I get a stack trace from either this example or the internal one, it has always looked like:

Thread 0x7F6BC928B6C0 (active+gil): "Dummy-1"
    <genexpr> (pants/backend/docker/utils.py:115)
    get_unreferenced (pants/backend/docker/utils.py:115)
    reference (pants/backend/docker/utils.py:91)
    suggest_renames (pants/backend/docker/utils.py:141)
    create (pants/backend/docker/util_rules/docker_build_context.py:149)
    create_docker_build_context (pants/backend/docker/util_rules/docker_build_context.py:376)

or

Thread 0x7FF48394D6C0 (active+gil): "Dummy-1"
    dirname (posixpath.py:155)
    is_referenced (pants/backend/docker/utils.py:106)
    is_referenced (pants/backend/docker/utils.py:108)
    is_referenced (pants/backend/docker/utils.py:108)
    is_referenced (pants/backend/docker/utils.py:108)
    get_unreferenced (pants/backend/docker/utils.py:118)
    reference (pants/backend/docker/utils.py:91)
    suggest_renames (pants/backend/docker/utils.py:141)
    create (pants/backend/docker/util_rules/docker_build_context.py:149)
    create_docker_build_context (pants/backend/docker/util_rules/docker_build_context.py:376)

Which points to https://github.com/pantsbuild/pants/blob/release_2.20.0/src/python/pants/backend/docker/utils.py#L111

  • "loose" Pexs generate a lot of files, which the utils.py code isn't able to process quickly
  • I'm not sure why, but it hangs the UI (dots not spinning), and I assume no code ought to be able to do that.

Version Pants: 2.19 / 2.20

Client:
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.8
 Git commit:        ced0996600
 Built:             Mon Sep 11 00:47:04 2023
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.8
  Git commit:       4ffc61430bbe6d3d405bdf357b766bf303ff3cc5
  Built:            Mon Sep 11 08:08:05 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.1
  GitCommit:        2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc:
  Version:          1.1.7
  GitCommit:        4ffc61430bbe6d3d405bdf357b766bf303ff3cc5
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad007797e0dcd8b7126f27bb87401d224240

cburroughs avatar Apr 19 '24 21:04 cburroughs

Idle thought based on the UI dots hanging: I wonder if part of the problem here is some blocking code running within the async/await executor, i.e. if the UI is being managed by an async task (I don't know if this is the case), it is being starved by some unexpectedly-synchronous operation blocking a whole thread.

huonw avatar Apr 20 '24 04:04 huonw