etl
etl copied to clipboard
Update traceroute parser to use scamper schema for text files
During the Gardener deployments, we have discovered that traceroute write performance prevents many tasks from completing, which prevents Gardener from making progress. The reason is the combination of 1hr response time limit from AEFlex connections and that traceroute writes 10-20x more rows for every TCP connection to the platform with it's current schema.
The short term plan is to disable the Gardener traceroute deployment. https://github.com/m-lab/dev-tracker/issues/145
The long term plan is to modify the traceroute to be based on scamper. The parser would parse the legacy text files and construct a scamper record based on that data. This would reduce the row writes to approximately match sidestream (one row per TCP connection).
@yachang does this description match your understanding of what we discussed?
This should also include using the batch annotator interface to annotate all hops in an inserter buffer.