etl icon indicating copy to clipboard operation
etl copied to clipboard

Fix local development mode to accept paris-traceroute archives

Open cristinaleonr opened this issue 3 years ago • 3 comments

Currently, etl_worker crashes in local development mode when a paris-traceroute archive is supplied as a URL.

Steps to reproduce:

  1. Navigate to cmd/etl_worker within the ETL project.
  2. Run go run ./etl_worker.go -service_port :8080 -output_dir ./output -output local.
  3. Open up another terminal and set the URL variable to some paris-traceroute archive (e.g., URL=gs://archive-measurement-lab/paris-traceroute/2019/11/19/20191119T000000Z-mlab1-ord03-paris-traceroute-0000.tgz).
  4. Run curl "http://localhost:8081/v2/worker?filename=$URL"
  5. The etl_worker crashes with:
2021/10/11 18:48:31 worker.go:174: <nil> creating parser for traceroute gs://archive-measurement-lab/paris-traceroute/2013/05/08/20130508T000000Z-mlab3-akl01-paris-traceroute-0000.tgz
2021/10/11 18:48:31 server.go:3159: http: panic serving [::1]:56948: runtime error: invalid memory address or nil pointer dereference
goroutine 135 [running]:
net/http.(*conn).serve.func1()
        /usr/local/go/src/net/http/server.go:1801 +0xb9
panic({0xd47080, 0x151a520})
        /usr/local/go/src/runtime/panic.go:1047 +0x266
github.com/m-lab/etl/task.(*Task).Close(0x0)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/task/task.go:67 +0x19
panic({0xd47080, 0x151a520})
        /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/m-lab/etl/task.(*Task).ProcessAllTests(0x4, 0x50)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/task/task.go:85 +0x4f
github.com/m-lab/etl/worker.DoGKETask(_, {{0xc0002ee150, 0x6f}, {0xc0002ee16d, 0x52}, {0xc0002ee155, 0x17}, {0x0, 0x0}, {0xc0002ee16d, ...}, ...})
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/worker/worker.go:209 +0x30
github.com/m-lab/etl/worker.ProcessGKETask({_, _}, {{0xc0002ee150, 0x6f}, {0xc0002ee16d, 0x52}, {0xc0002ee155, 0x17}, {0x0, 0x0}, ...}, ...)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/worker/worker.go:204 +0x4de
main.(*runnable).Run(0xc00000c3c0, {0xfb61e8, 0xc00019a000})
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/cmd/etl_worker/etl_worker.go:313 +0x2f6
main.handleLocalRequest({0xfb1500, 0xc0001f5ea0}, 0x0)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/cmd/etl_worker/etl_worker.go:196 +0x189
net/http.HandlerFunc.ServeHTTP(0x0, {0xfb1500, 0xc0001f5ea0}, 0x0)
        /usr/local/go/src/net/http/server.go:2046 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00022400f, {0xfb1500, 0xc0001f5ea0}, 0xc000432200)
        /usr/local/go/src/net/http/server.go:2424 +0x149
net/http.serverHandler.ServeHTTP({0xc0005bdb90}, {0xfb1500, 0xc0001f5ea0}, 0xc000432200)
        /usr/local/go/src/net/http/server.go:2878 +0x43b
net/http.(*conn).serve(0xc00045c000, {0xfb6258, 0xc0001335c0})
        /usr/local/go/src/net/http/server.go:1929 +0xb08
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:3033 +0x4e8

Note: this does not happen with other datatypes (e.g., PCAP, hopannotation1, scamper1).

cristinaleonr avatar Oct 11 '21 19:10 cristinaleonr

Ah, so there are currently two processing paths in the etl_worker: one for the "v1" system, and another for the "v2" system.

  • the "v1" path (/worker) is used by the v1 pipeline, web100 data types (and any others not yet migrated to v2).
  • the "v2" path (/v2/worker) is used by the v2 pipeline and ordinarily runs in the GKE environment. v2 does not yet support all data types. The "Traceroute migration" work is in part to solve this.

So, try the same GCS URL with the resource path /worker instead.

stephen-soltesz avatar Oct 11 '21 22:10 stephen-soltesz

And, I think the next issue will be that the v1 system does not support local output.

stephen-soltesz avatar Oct 12 '21 11:10 stephen-soltesz

Thanks for clarifying!

I tried with the /worker path. I think you're right about the v1 system not supporting local output, because now the output is this error: 2021/10/12 13:57:26 insert.go:299: InsertErr googleapi: Error 400: The destination table is invalid: projec_id , dataset_id base_tables, table_id: traceroute., invalid on traceroute_20191119

cristinaleonr avatar Oct 12 '21 14:10 cristinaleonr