Add atomic keyword
closes https://github.com/innobi/pantab/issues/380
Hi @skyth540 - if you get the chance to test this out would greatly appreciate it. Should resolve the performance issues you have seen when looping appends if you add atomic=False to your keywords
The risk to this keyword is that the Hyper file could end up in a corrupt state if any loop iteration fails
You can install from this branch with:
pip install git+https://github.com/innobi/pantab.git@add-atomic-keyword
From what I can tell, it didn't make any change... each iteration still takes longer and longer
Hmm that's unfortunate. Do you have any code I can use to reproduce?
If its not the file copy that is the problem then there might be some limitations with the Hyper API around its insertion time. We can ask that team but would be great to rule out other issues with a self-contained example first!
@skyth540 this might be a better MRE:
import pandas as pd
import numpy as np
import pantab as pt
import time
df = pd.DataFrame(np.random.randn(100_000, 10), columns=list("abcdefghij"))
for i in range(100):
start = time.time()
pt.frame_to_hyper(
df,
"example.hyper",
table = 'table',
table_mode = 'a',
)
end = time.time()
print(f"Iteration {i} took {end - start}")
Running that yields the following runtime for me:
Adding atomic=False:
import pandas as pd
import numpy as np
import pantab as pt
import time
df = pd.DataFrame(np.random.randn(100_000, 10), columns=list("abcdefghij"))
for i in range(100):
start = time.time()
pt.frame_to_hyper(
df,
"example.hyper",
table = 'table',
table_mode = 'a',
atomic=False,
)
end = time.time()
print(f"Iteration {i} took {end - start}")
made that appear much closer to constant time
Do you see the same results?
Merging for now as I want to cut a release candidate soon. If you can provide a reproducible MRE for whatever issue remains let's open a new issue and can take a look