pgreplay-go icon indicating copy to clipboard operation
pgreplay-go copied to clipboard

Parsing csv log.

Open jalexandre0 opened this issue 3 years ago • 3 comments

Hi there.

Theres a way to parse and replay from csv logs for people who can't change their log_line_prefix (aka RDS customers).

jalexandre0 avatar Jun 30 '21 14:06 jalexandre0

hit this issue too @jalexandre0 did you managed to replay RDS logs by pgreplay-go?

so far, I was able to convert RDS csvlog to pgreplay-go compatible log by writing a simple parser, which is converting one csvlog into two lines like this:

csvlog:

2021-09-09 17:00:00.006 UTC,"user","database",27752,"172.30.1.2:34106",613a286d.6c68,13992,
"SELECT",2021-09-09 15:29:49 UTC,229/3866470,0,LOG,00000,
"execute <unnamed>: SELECT ""jobs"".* FROM ""jobs"" WHERE ""jobs"".""deleted_at"" IS NULL
AND ""jobs"".""user_id"" = $1","parameters: $1 = '124765'",,,,,,,,"bin/rails"

pgreplay-go log

2021-09-09 17:00:00.006 UTC |user|database|613a286d.6c68|LOG:  execute <unnamed>: SELECT "jobs".* FROM "jobs" WHERE "jobs"."deleted_at" IS NULL AND "jobs"."user_id" = $1
2021-09-09 17:00:00.006 UTC |user|database|613a286d.6c68|DETAIL:  parameters: $1 = '124765'

@lawrencejones does such an approach looks valid to you or am I missing something?

mvasilenko avatar Sep 20 '21 15:09 mvasilenko

Hey @jalexandre0 - Unfortunately we don't have any native support for CSV logs at the moment, correct.

In theory adding another parser to do this would be pretty straight-forward, given that the logs contain all of the required data, in a roughly similar format. Contributions are welcome!

In the meantime, you could possibly craft some sed/awk commands to meld the log into the correct format. It's also worth noting that pgreplay-go has a (less documented) JSON input format. You can get a feel for what this format looks like by converting an errlog-format log into this: pgreplay-go --debug filter --errlog-input=errlog.log --output=logs.json

benwh avatar Jul 22 '22 16:07 benwh

I've created another simple parser that converts an RDS CSV into a pgreplay-go JSON file. You can specify the number of lines and filter them out for a specific use. I've used this to parse CSVs with 10-60GB, and although it was not improved for speed, it could parse the bigger files within 30/45min. https://gist.github.com/ghunti/66f32304b7d7ed9f9b5714ba41a725dc

ghunti avatar Oct 03 '22 08:10 ghunti