gramex
gramex copied to clipboard
GRAMEX-144 ⁃ Avoid HDF5 for MLHandler storage
MLHandler is the only Gramex component that internally requires HDF5. (UploadHandler used to do this, but we migrated away from that.)
So PyTables is necessary for Gramex. But since we can't pip install pytables, I'd like to make this optional.
Can we use Excel storage for MLHandler Frames?
┆Issue is synchronized with this Jira Bug
@sanand0, we have three options:
- Move h5py / pytables to conda - this is already happening, almost no change required
- Remove data storage in MLHandler completely - Might need some refactoring, but will greatly simplify the API. Users will have to POST data on every train / retrain.
- Use Excel - this will make MLHandler slow for larger datasets
I would pick option 2. Keeps things simple and clean. What would you pick?
#2. Remove data storage in MLHandler completely.
From: Jaidev Deshpande @.> Sent: Monday, January 24, 2022 10:49 AM To: gramener/gramex @.> Cc: Subscribed @.***> Subject: Re: [gramener/gramex] GRAMEX-144 ⁃ Avoid HDF5 for MLHandler storage (Issue #491)
- Move h5py / pytables to conda - this is already happening, almost no change required
- Remove data storage in MLHandler completely - Might need some refactoring, but will greatly simplify the API. Users will have to POST data on every train / retrain.
- Use Excel - this will make MLHandler slow for larger datasets
What would you pick?
— Reply to this email directly, view it on GitHubhttps://ind01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgramener%2Fgramex%2Fissues%2F491%23issuecomment-1019731118&data=04%7C01%7Csandeep.bhat%40gramener.com%7C513b7cc2dc564092834e08d9def91026%7Cdd3e2cbf8642480c9db6b7ba55bbf330%7C0%7C0%7C637785983585688227%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=eRwPuwI8KHGgVx6jb2IHP959N9ACW8EyD9Vspbenkxw%3D&reserved=0, or unsubscribehttps://ind01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQC65AZWVVX5M2NAGR3HOL3UXTONBANCNFSM5MS5AYYQ&data=04%7C01%7Csandeep.bhat%40gramener.com%7C513b7cc2dc564092834e08d9def91026%7Cdd3e2cbf8642480c9db6b7ba55bbf330%7C0%7C0%7C637785983585688227%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=CeSdj5Q11huCSbHkZw4VbDusVX%2FzKjW8U1vfeBN8yrM%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://ind01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Csandeep.bhat%40gramener.com%7C513b7cc2dc564092834e08d9def91026%7Cdd3e2cbf8642480c9db6b7ba55bbf330%7C0%7C0%7C637785983585688227%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Z6EEJM5ebEnHZ151MPiNG9aSSSFjsIVQm7zc2uDW5YA%3D&reserved=0 or Androidhttps://ind01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Csandeep.bhat%40gramener.com%7C513b7cc2dc564092834e08d9def91026%7Cdd3e2cbf8642480c9db6b7ba55bbf330%7C0%7C0%7C637785983585688227%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oDnR5eMWkbZjAUqbEMRPbnNAxB7NZ5PBqMhaksY9orM%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Disclaimer: This email may be confidential. Don't share without consent. Inform sender if you got it by mistake.
I'd like to try out Excel first, please. It's hopefully low effort. We can then evaluate the impact of removing data storage completely.
Noted, I'll send a PR today.
Meanwhile, FWIW, if we pick option 1, MLHandler works fine on Python 3.7, 3.8 and 3.9