arctic reduce chunkstore memory footprint

Two changes:

Reduce memory footprint when reading data
Handle duplicate columns in the filter.

Using a 1GB dataframe:

this PR: plot_new

master: plot_old

import numpy as np
import pandas as pd
from datetime import datetime as dt
from datetime import timedelta as td

days = 2000
secs = 15000

a1 = [range(secs) for _ in range(days)]
a2 = [[dt(2000,1,1)+td(days=x)]*secs for x in range(days)]
a3 = [['foo']*secs for _ in range(days)]
a4 = [np.random.rand(secs) for _ in range(days)]
a5 = [np.random.rand(secs) for _ in range(days)]
a6 = [['HOLIDAY INN WORLD CORP']*secs for _ in range(days)]

now = dt.now()
result = []
for i in range(days):
    result.append(pd.DataFrame({'security_id':a1[i], 'date':a2[i], 'c':a3[i], 'd':a4[i], 'e':a5[i], 'f':a6[i]}, copy=True))
df = pd.concat(result)
print(df.shape)
print((dt.now() - now).total_seconds())
df = df.set_index(['date','security_id'])
print(df.memory_usage(index=True).sum() / 1e6)
from arctic import Arctic
import arctic
print(arctic.__file__)
a = Arctic('localhost')
a.initialize_library('test', lib_type='ChunkStoreV1')
lib = a['test']
lib.write('test', df)
del df
df = lib.read('test')

Apr 19 '19 13:04 TomTaylorLondon

What's the memory saving? Have you measured it? 50% only 1 copy of data instead of 2?

Would be great to have a way to show the saving, and automated test to avoid any accidental regressions.

Apr 19 '19 14:04 yschimke

@TomTaylorLondon are you going to have the bandwidth to finish this or would you like me to resolve it?

May 09 '19 10:05 bmoscon

Hi @TomTaylorLondon any luck with this?

Jul 05 '19 11:07 shashank88

@shashank88 I spoke with @TomTaylorLondon and am going to take this over from him. I'll get it all fixed up later this week(end).

Jul 05 '19 11:07 bmoscon

@shashank88 I spoke with @TomTaylorLondon and am going to take this over from him. I'll get it all fixed up later this week(end).

👍

Jul 05 '19 21:07 shashank88

arctic arctic copied to clipboard

reduce chunkstore memory footprint

arctic
arctic copied to clipboard