OOM when working with 1 GB PostgreSQL heap (table) file
What version are you using (fq -v)?
$ fq -v 0.0.9 (linux amd64)
How was fq installed?
fq is build from source.
My branch: https://github.com/pnsafonov/fq/tree/postgres
Can you reproduce the problem using the latest release or master branch?
I can reproduce problem on my branch, where PostgreSQL parsers are implemented.
1 GB file: https://github.com/pnsafonov/fq_testdata_postgres14 https://github.com/pnsafonov/fq_testdata_postgres14/raw/master/16397
What did you do?
I am trying to parse a heap (relation, table) file of maximum size (1 GB for PostgeSQL). fq consumes 90-100 times memory than file size. For 80 mb file fq requires 7,5 GB RAM.
time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0, 1, 2, -1] | tovalue" 16397 Killed real 0m50.794s user 1m11.962s sys 0m8.994s
Kenel messages:
sudo dmesg | tail -2 [193541.830725] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=fq,pid=454783,uid=1000 [193541.830748] Out of memory: Killed process 454783 (fq) total-vm:31508780kB, anon-rss:26629332kB, file-rss:272kB, shmem-rss:0kB, UID:1000 pgtables:58860kB oom_score_adj:0
1 GB File:
ls -alh 16397 -rw-r----- 1 pavel pavel 1.0G Aug 31 08:38 16397
Memory profiler results:
go tool pprof mem.prof File: postgres.test Type: alloc_space Time: Aug 30, 2022 at 6:16pm (MSK) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top 40 Showing nodes accounting for 31041.34MB, 97.35% of 31886.71MB total Dropped 270 nodes (cumI check with disassembler.
TryFieldScalarFn.func1is unnamed lambda inTryFieldScalarFn.
Hello, at the moment fq has not been optimized to use less memory, focus has been to make it work at all :) The main reason it uses lots of memory at the moment is that it does very little "lazy" decoding, instead it will decode the full file and each field added will have to keep track of lots of metadata, it's name, parent, children, bit range, decode value, optional symbolic value, optional description string etc.
I'm not familiar with the pghep format but for example the mp4 decoder in fq has a option to skip decoding of individual samples if that not interesting (you can still decode individual samples manually) which speeds up decoding a lot.
But there are some options to improve the situation that i've thought about, most of them sadly quite complicated and not a easy task:
- Try to compact down decode.Value and scalar.S somehow. They are the ones using the most memory i think.
- Add some kind of slice to keep track of optional things like symbolic value, display format etc
- Introduce interfaces for decode.Value and/or scalar.S
- Can have type specific implementations that store less data
- Can have implementation that only store actual value etc
- Possible could have arrays that knows type and length
- Do lazy decoding somehow
- Might behave strange when there are errors or broken files as it's not noticed until you do a query etc
- Not sure how it would interact with probing and possible some other things
- Probably would make some queries slower as you have to decode and possibly do IO as "query time"
- Do dull decode but only store ranges and then late decode again
- Would detect errors
- Probably would make some queries slower as you have to decode and possibly do IO as "query time"
Let me know if you have other ideas