pantab Add atomic keyword

closes https://github.com/innobi/pantab/issues/380

Oct 28 '24 13:10 WillAyd

Hi @skyth540 - if you get the chance to test this out would greatly appreciate it. Should resolve the performance issues you have seen when looping appends if you add atomic=False to your keywords

The risk to this keyword is that the Hyper file could end up in a corrupt state if any loop iteration fails

Oct 28 '24 13:10 WillAyd

You can install from this branch with:

pip install git+https://github.com/innobi/pantab.git@add-atomic-keyword

Oct 28 '24 14:10 WillAyd

From what I can tell, it didn't make any change... each iteration still takes longer and longer

Oct 29 '24 17:10 skyth540

Hmm that's unfortunate. Do you have any code I can use to reproduce?

Oct 29 '24 18:10 WillAyd

If its not the file copy that is the problem then there might be some limitations with the Hyper API around its insertion time. We can ask that team but would be great to rule out other issues with a self-contained example first!

Oct 29 '24 18:10 WillAyd

@skyth540 this might be a better MRE:

import pandas as pd
import numpy as np
import pantab as pt

import time

df = pd.DataFrame(np.random.randn(100_000, 10), columns=list("abcdefghij"))
for i in range(100):
    start = time.time()
    pt.frame_to_hyper(
        df,
        "example.hyper",
        table = 'table',
        table_mode = 'a',
    )
    end = time.time()
    print(f"Iteration {i} took {end - start}")

Running that yields the following runtime for me:

Adding atomic=False:

import pandas as pd
import numpy as np
import pantab as pt

import time

df = pd.DataFrame(np.random.randn(100_000, 10), columns=list("abcdefghij"))
for i in range(100):
    start = time.time()
    pt.frame_to_hyper(
        df,
        "example.hyper",
        table = 'table',
        table_mode = 'a',
        atomic=False,
    )
    end = time.time()
    print(f"Iteration {i} took {end - start}")

made that appear much closer to constant time

Do you see the same results?

Oct 30 '24 18:10 WillAyd

Merging for now as I want to cut a release candidate soon. If you can provide a reproducible MRE for whatever issue remains let's open a new issue and can take a look

Oct 31 '24 13:10 WillAyd