etl icon indicating copy to clipboard operation
etl copied to clipboard

Update traceroute parser to use scamper schema for text files

Open stephen-soltesz opened this issue 6 years ago • 2 comments

During the Gardener deployments, we have discovered that traceroute write performance prevents many tasks from completing, which prevents Gardener from making progress. The reason is the combination of 1hr response time limit from AEFlex connections and that traceroute writes 10-20x more rows for every TCP connection to the platform with it's current schema.

The short term plan is to disable the Gardener traceroute deployment. https://github.com/m-lab/dev-tracker/issues/145

The long term plan is to modify the traceroute to be based on scamper. The parser would parse the legacy text files and construct a scamper record based on that data. This would reduce the row writes to approximately match sidestream (one row per TCP connection).

stephen-soltesz avatar Nov 16 '18 15:11 stephen-soltesz

@yachang does this description match your understanding of what we discussed?

stephen-soltesz avatar Nov 16 '18 17:11 stephen-soltesz

This should also include using the batch annotator interface to annotate all hops in an inserter buffer.

gfr10598 avatar Feb 04 '19 17:02 gfr10598