chromedp icon indicating copy to clipboard operation
chromedp copied to clipboard

chromedp-runner temp directory is not always deleted

Open alebcay opened this issue 4 years ago • 5 comments

What versions are you running?

github.com/chromedp/chromedp v0.5.3 Chromium 83.0.4092.0 go version go1.14.1 darwin/amd64

$ go list -m github.com/chromedp/chromedp
$ google-chrome --version
$ go version

What did you do? Include clear steps.

  • Passed new context to chromedp and executed actions; relevant invocations:
func GetNumberOfPages(url string) (int, error) {
	ctx, _ := chromedp.NewContext(context.Background())
	defer chromedp.Cancel(ctx)

	var res string
	err := chromedp.Run(ctx,
		chromedp.Navigate(url),
		chromedp.InnerHTML(`:root`, &res, chromedp.NodeVisible),
	)
	if err != nil {
		panic(err)
	}

	re := regexp.MustCompile(`– (\d+) of (\d+) pages"\ssrc="`)
	number, err := strconv.Atoi(strings.Fields(strings.TrimSpace(re.FindString(res)))[3])

	return number, err
}

func GeneratePDF(url string, dest string, width float64, height float64) error {
	paper_width := (width / 96.0) + 2.0
	paper_height := (height / 96.0) + 2.0

	// create context
	ctx, _ := chromedp.NewContext(context.Background())
	defer chromedp.Cancel(ctx)

	var pdfReader io.Reader
	err := chromedp.Run(ctx, chromedp.Tasks{
		chromedp.Navigate(url),
		chromedp.WaitReady("svg"),
		chromedp.ActionFunc(func(ctx context.Context) error {
			buf, _, err := page.PrintToPDF().
				WithPaperWidth(paper_width).
				WithPaperHeight(paper_height).
				WithMarginTop(1.0).
				WithMarginBottom(1.0).
				WithMarginLeft(1.0).
				WithMarginRight(1.0).
				WithPageRanges("1").
				Do(ctx)
			if err != nil {
				return err
			}
			pdfReader = bytes.NewBuffer(buf)
			return nil
		}),
	})

	if err != nil {
		return err
	}

	destFile, err := os.Create(dest)
	if err != nil {
		return err
	}
	defer destFile.Close()

	_, err = io.Copy(destFile, pdfReader)
	if err != nil {
		return err
	}

	return nil
}

Both of these functions are run in a single-thread context (no goroutines); not sure if that affects anything.

What did you expect to see?

The GeneratePDF function is run multiple times in a for loop, and when I watch my temp folders directory having the chromedp-runner* temp directory recreated many times. I would expect that at the end of execution, no chromedp-runner* temp directories remain because the entire cleanup should be contained within the defer chromedp.Cancel(ctx) call (I have tried both defer cancel() and defer chromedp.Cancel(ctx) with the same results).

What did you see instead?

One or more chromedp-runner* temp directories get left behind with the following structure:

❯ tree /private/var/folders/mc/b5bnhqzj05b67zv7chskx2zw0000gp/T/chromedp-runner028516848
/private/var/folders/mc/b5bnhqzj05b67zv7chskx2zw0000gp/T/chromedp-runner028516848
└── Default
    └── Cache
        └── index-dir
            └── the-real-index

❯ tree /private/var/folders/mc/b5bnhqzj05b67zv7chskx2zw0000gp/T/chromedp-runner069103546
/private/var/folders/mc/b5bnhqzj05b67zv7chskx2zw0000gp/T/chromedp-runner069103546
└── Default
    └── Session\ Storage
        └── CURRENT

❯ tree /private/var/folders/mc/b5bnhqzj05b67zv7chskx2zw0000gp/T/chromedp-runner760816317
/private/var/folders/mc/b5bnhqzj05b67zv7chskx2zw0000gp/T/chromedp-runner760816317
└── Default
    └── Session\ Storage
        └── 000003.log

❯ tree /private/var/folders/mc/b5bnhqzj05b67zv7chskx2zw0000gp/T/chromedp-runner765244031
/private/var/folders/mc/b5bnhqzj05b67zv7chskx2zw0000gp/T/chromedp-runner765244031
└── Default
    └── Session\ Storage
        └── CURRENT

These don't take up a lot of space, but it would be nice if they would not get left behind.

I verified that there were no stray chrome / Chromium processes still running, so I don't believe these are leftovers from orphaned processes left running in the background.

alebcay avatar Mar 29 '20 05:03 alebcay

I think I've resolved the issue in my local use-case by simply making all my chromedp invocations share/reuse a single context rather than creating new ones each time. This single context (and associated temp folder) seems to be getting cleaned up properly.

alebcay avatar Mar 29 '20 15:03 alebcay

Can you provide a self-contained main.go file to reproduce this issue?

mvdan avatar Jul 06 '20 13:07 mvdan

I'm no longer able to reproduce the issue, at least in the original way that I observed it back in March. The temp folders still accumulate while the program is running but all of the temp folders seem to get cleared away when the program exits. The temp folders are not cleared if the program is interrupted (e.g. via Ctrl + C), although I'm not sure if this is expected/intended behavior.

I believe this was indirectly fixed via 3976e2ae9cebe027f6c2113e446627861aa5acef but I'm not sure.

Test code:

package main

import (
	"context"

	"github.com/chromedp/chromedp"
)

func main() {
	for i := 1;  i<=10; i++ {
        ctx, _ := chromedp.NewContext(context.Background())
        defer chromedp.Cancel(ctx)

        chromedp.Run(ctx, chromedp.Tasks{
    		chromedp.Navigate("https://google.com"),
    		chromedp.WaitReady("html"),
    	})
	}
}

alebcay avatar Jul 06 '20 20:07 alebcay

Is there a way to disable these temp folders from being created at all? We run this library on tens of thousands of agents deployed around the world and cannot always guarantee a graceful shutdown.

clarkmcc avatar Jun 11 '21 15:06 clarkmcc

I don't think that chrome can run without the user data directory. That said, you can specify the user data directory with chromedp.UserDataDir(), then chromedp won't create it (but chrome will write to the specified directory, and chromedp won't touch it). I'm not sure will this make it better for you.

ZekeLu avatar Jun 11 '21 15:06 ZekeLu

Since the issue is not longer reproducible, closing.

Please file a new issue with a self-contained reproducer if it happens again.

ZekeLu avatar Aug 14 '22 04:08 ZekeLu

Hi @ZekeLu ,

This is happening to me, when running constantly it ends up filling more than 50/100 gbs in the tmp directory, leaving my VPS without any memory left and breaking a lot of things. This doesn't happen in my other server running the same thing, not sure what are the reasons of this happening.

If you google "chromedp-runner" there are other people complaining too about the same.

marcelo321 avatar Sep 26 '22 23:09 marcelo321

@marcelo321 Thank you for the feedback! Have you tried 0.8.4? It has shipped the commit fb22a3c9e832e0d18aa3838298552563576a46c9, which hopefully will address the issue.

ZekeLu avatar Sep 27 '22 01:09 ZekeLu

@ZekeLu I hate I don't get notifications on replies like this :/

So I am importing it like this:

import (
	"bufio"
	"sync"
         "etc.."
	"github.com/chromedp/chromedp"
)

and I already run the go get command to update it, should be set?

marcelo321 avatar Oct 12 '22 23:10 marcelo321

so I deleted the binary of my program and build it again apart from doing go get url to update chromedp, should I do anything else? sorry for the newbi question. is "github.com/chromedp/chromedp" imported ok?

marcelo321 avatar Oct 12 '22 23:10 marcelo321

Run go list -m github.com/chromedp/chromedp in the root of your module to get the version of chromedp. If you have run go get then it should have been upgraded to the latest version.

If you can reproduce the issue, please file a new issue with concrete information on how to reproduce it.

ZekeLu avatar Oct 13 '22 02:10 ZekeLu