etl
etl copied to clipboard
Fix local development mode to accept paris-traceroute archives
Currently, etl_worker
crashes in local development mode when a paris-traceroute archive is supplied as a URL.
Steps to reproduce:
- Navigate to cmd/etl_worker within the ETL project.
- Run
go run ./etl_worker.go -service_port :8080 -output_dir ./output -output local
. - Open up another terminal and set the URL variable to some paris-traceroute archive (e.g.,
URL=gs://archive-measurement-lab/paris-traceroute/2019/11/19/20191119T000000Z-mlab1-ord03-paris-traceroute-0000.tgz
). - Run
curl "http://localhost:8081/v2/worker?filename=$URL"
- The etl_worker crashes with:
2021/10/11 18:48:31 worker.go:174: <nil> creating parser for traceroute gs://archive-measurement-lab/paris-traceroute/2013/05/08/20130508T000000Z-mlab3-akl01-paris-traceroute-0000.tgz
2021/10/11 18:48:31 server.go:3159: http: panic serving [::1]:56948: runtime error: invalid memory address or nil pointer dereference
goroutine 135 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1801 +0xb9
panic({0xd47080, 0x151a520})
/usr/local/go/src/runtime/panic.go:1047 +0x266
github.com/m-lab/etl/task.(*Task).Close(0x0)
/usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/task/task.go:67 +0x19
panic({0xd47080, 0x151a520})
/usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/m-lab/etl/task.(*Task).ProcessAllTests(0x4, 0x50)
/usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/task/task.go:85 +0x4f
github.com/m-lab/etl/worker.DoGKETask(_, {{0xc0002ee150, 0x6f}, {0xc0002ee16d, 0x52}, {0xc0002ee155, 0x17}, {0x0, 0x0}, {0xc0002ee16d, ...}, ...})
/usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/worker/worker.go:209 +0x30
github.com/m-lab/etl/worker.ProcessGKETask({_, _}, {{0xc0002ee150, 0x6f}, {0xc0002ee16d, 0x52}, {0xc0002ee155, 0x17}, {0x0, 0x0}, ...}, ...)
/usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/worker/worker.go:204 +0x4de
main.(*runnable).Run(0xc00000c3c0, {0xfb61e8, 0xc00019a000})
/usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/cmd/etl_worker/etl_worker.go:313 +0x2f6
main.handleLocalRequest({0xfb1500, 0xc0001f5ea0}, 0x0)
/usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/cmd/etl_worker/etl_worker.go:196 +0x189
net/http.HandlerFunc.ServeHTTP(0x0, {0xfb1500, 0xc0001f5ea0}, 0x0)
/usr/local/go/src/net/http/server.go:2046 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00022400f, {0xfb1500, 0xc0001f5ea0}, 0xc000432200)
/usr/local/go/src/net/http/server.go:2424 +0x149
net/http.serverHandler.ServeHTTP({0xc0005bdb90}, {0xfb1500, 0xc0001f5ea0}, 0xc000432200)
/usr/local/go/src/net/http/server.go:2878 +0x43b
net/http.(*conn).serve(0xc00045c000, {0xfb6258, 0xc0001335c0})
/usr/local/go/src/net/http/server.go:1929 +0xb08
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3033 +0x4e8
Note: this does not happen with other datatypes (e.g., PCAP, hopannotation1, scamper1).
Ah, so there are currently two processing paths in the etl_worker: one for the "v1" system, and another for the "v2" system.
- the "v1" path (
/worker
) is used by the v1 pipeline, web100 data types (and any others not yet migrated to v2). - the "v2" path (
/v2/worker
) is used by the v2 pipeline and ordinarily runs in the GKE environment. v2 does not yet support all data types. The "Traceroute migration" work is in part to solve this.
So, try the same GCS URL with the resource path /worker
instead.
And, I think the next issue will be that the v1 system does not support local output.
Thanks for clarifying!
I tried with the /worker
path. I think you're right about the v1 system not supporting local output, because now the output is this error:
2021/10/12 13:57:26 insert.go:299: InsertErr googleapi: Error 400: The destination table is invalid: projec_id , dataset_id base_tables, table_id: traceroute., invalid on traceroute_20191119