DeepRec
DeepRec copied to clipboard
Memory leak in ParquetDataset
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- DeepRec version or commit id: 1.15.5+deeprec2208
- Python version: python3.6
- Bazel version (if compiling from source): bazel 0.26.1
- GCC/Compiler version (if compiling from source): 5.4.0
- CUDA/cuDNN version: none
Describe the current behavior Memory leak in ParquetDataset has occured, after run python code, the memory has increase to 3Gb
Describe the expected behavior Memory stable in ParquetDataset
Code to reproduce the issue
import os
import tensorflow as tf
from tensorflow.python.data.experimental.ops.dataframe import DataFrame
from tensorflow.python.data.experimental.ops.parquet_dataset_ops import ParquetDataset
from tensorflow.python.data.ops import dataset_ops
fields = get_parquet_fields_type(conf)
def make_initializable_iterator(ds):
if hasattr(dataset_ops, "make_initializable_iterator"):
return dataset_ops.make_initializable_iterator(ds)
return ds.make_initializable_iterator()
filename = './output.parquet'
def build_input_fn():
def parse_parquet(record):
label = record.pop("clk")
return record, label
return parse_parquet
def build_dataset():
dataset = ParquetDataset(
filename,
batch_size=256,
fields=fields,
drop_remainder=True,
)
dataset = dataset.map(build_input_fn())
dataset = dataset.prefetch(2)
return dataset
for i in range(1000):
ds = build_dataset()
iterator = make_initializable_iterator(ds)
with tf.Session() as sess:
sess.run(iterator.initializer)
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.