Memory leak during btree(agtype)
Describe the bug A clear and concise description of what the bug is.
I found this bug when I try to build a BTree index on a vertex table and got an OOM. Luckily I'm able to reproduce it and locate the bug.
The memory leak is during BTree building, when calling compare_agtype_containers_orderability function. Inside the function, it calls agtype_iterator_next and thenfill_agtype_value to grab the value of an iterater into an agtype_value variable. And it calls pnstrdup, which pallocs memory.
if (AGTE_IS_NULL(entry))
{
result->type = AGTV_NULL;
}
else if (AGTE_IS_STRING(entry))
{
char *string_val;
int string_len;
result->type = AGTV_STRING;
/* get the position and length of the string */
string_val = base_addr + offset;
string_len = get_agtype_length(container, index);
/* we need to do a deep copy of the string value */
result->val.string.val = pnstrdup(string_val, string_len);
result->val.string.len = string_len;
Assert(result->val.string.len >= 0);
}
else if (AGTE_IS_NUMERIC(entry))
{
Numeric numeric;
Numeric numeric_copy;
result->type = AGTV_NUMERIC;
/* we need to do a deep copy here */
numeric = (Numeric)(base_addr + INTALIGN(offset));
numeric_copy = (Numeric) palloc(VARSIZE(numeric));
memcpy(numeric_copy, numeric, VARSIZE(numeric));
result->val.numeric = numeric_copy;
}
The palloced memory is not released in the compare_agtype_containers_orderability, which result in OOMs.
How are you accessing AGE (Command line, driver, etc.)?
- [e.g. JDBC] psql
What data setup do we need to do? nope
What is the necessary configuration info needed?
- [e.g. Installed PostGIS] nope
What is the command that caused the error?
It is easy to reproduce this problem. Just build a btree on a large agtype column.
CREATE TABLE test(num agtype);
INSERT INTO test SELECT ('{"id": "'||a||'"}')::agtype FROM generate_series(1,100000000) a;
CREATE INDEX ON test USING btree(num);
And it will fastly take ~100G memory.
Expected behavior A clear and concise description of what you expected to happen.
Environment (please complete the following information):
- Version: [e.g. 0.4.0]
Additional context Add any other context about the problem here.
I can fix this problem by modifying agtype_utils.c in the following steps:
- I pfree the agtype_value before exit
switch (va.type)
{
case AGTV_STRING:
case AGTV_NULL:
case AGTV_NUMERIC:
case AGTV_BOOL:
case AGTV_INTEGER:
case AGTV_FLOAT:
case AGTV_EDGE:
case AGTV_VERTEX:
case AGTV_PATH:
res = compare_agtype_scalar_values(&va, &vb);
pfree_agtype_value_content(&va);
pfree_agtype_value_content(&vb);
break;
...
...
- I ask the
pfree_agtype_value_contentto pfree the duplicated string
switch (value->type)
{
case AGTV_NUMERIC:
pfree(value->val.numeric);
break;
case AGTV_STRING:
/*
* The char pointer (val.string.val) is not free'd because
* it is not allocated by an agtype helper function.
*/
pfree(value->val.string.val);
break;
...
...
And then the memory leak is solved. However, It causes other problems. For example, in create_agtype_from_list, it use key_agtype = string_to_agtype_value("__id__");, and the string cannot be pfreed.
According to the code, i think the numeric type also have the same problem. I think maybe we have to create a new function called unfill_agtype_value to pfree the memory allocated for string and numeric type. But I'm not sure whether it have other side effects...
So i publish this bug report and hope somebody with more knowledge on this repo can fix this problem or tell me how to safely fix it.
@lyp-bobi Looking into it.
@lyp-bobi The problem is that there are places where the value isn't copied, for whatever reason. Using pfree ends up freeing stuff it's not supposed to. This is making it difficult to see what actually needs to be done to correct it.
@lyp-bobi Unfortunately, your fix doesn't resolve the memory issue. I've made modifications to fix the code and the regression tests that fail due to the modifications added. However, it still gobbles up memory. There is likely more to this than just that section of code.
@jrgemignani
@lyp-bobi The problem is that there are places where the value isn't copied, for whatever reason. Using pfree ends up freeing stuff it's not supposed to. This is making it difficult to see what actually needs to be done to correct it.
Maybe just create a new function other than pfree_agtype_value_content that is only used in compare_agtype_containers_orderability so it won't pfree unsupposed memories?
@lyp-bobi Unfortunately, your fix doesn't resolve the memory issue. I've made modifications to fix the code and the regression tests that fail due to the modifications added. However, it still gobbles up memory. There is likely more to this than just that section of code.
I have no idea about that. In my case most memory seems to be used by pnstrdup, but I'm not sure if there are other memory leaks or other problems :P
I'm glad to see you can view this problem from a higher level perspective.
@lyp-bobi That was one use of memory. I found the other and now I can build the index. This was from your commands above -
psql-16.2-5432-psql=# drop extension age cascade; create extension age; load 'age'; set search_path TO ag_catalog;
ERROR: extension "age" does not exist
CREATE EXTENSION
LOAD
SET
psql-16.2-5432-psql=# CREATE TABLE test(num agtype);
CREATE TABLE
psql-16.2-5432-psql=# INSERT INTO test SELECT ('{"id": "'||a||'"}')::agtype FROM generate_series(1,100000000) a;
INSERT 0 100000000
psql-16.2-5432-psql=# CREATE INDEX ON test USING btree(num);
CREATE INDEX
psql-16.2-5432-psql=#
Now I need to work on a PR to get this fix in.
edit: It no longer eats up process memory like before.
@lyp-bobi PR #2060 Addresses this issue for the master branch. It is currently in review and will be merged when done.
@lyp-bobi PR #2060 Addresses this issue for the master branch. It is currently in review and will be merged when done.
thanks :D