mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

Float32 RawArray?

Open robintibor opened this issue 3 years ago • 5 comments

Describe the new feature or enhancement

float32 rawarray

Describe your proposed implementation

dtype argument on creating rawarray

Additional comments

At the moment we are looking into potential memory savings when using mne for braindecode. We noticed the underlying _data of a RawArray is forced to be float32, and were wondering if it is realistic that mne also offers float32. Or is float64 too fundamental for mne/too many interactions with processing functions? Concretely, we would prefer float32 for two scenarios:

  1. Data is in main memory, float32 for saving main memory
  2. Data is stored on disk and loaded on the fly, float32 for faster storing and loading as well as saving hard disk space and main memory after load

Also other ideas what to do in those scenarios are appreciated.

robintibor avatar Jan 31 '22 13:01 robintibor

float64 was enforced historically to keep things simple. We wanted to have higher numerical precision for things like temporal filtering etc. Now I fully get your request but I do not quantify easily how much pain a native and tested support of float32 would be.

my approach would be give it a try with a real usecase in mind and send us a PR to see the diff when we have something that seems to work.

then we can reassess

wdyt?

Message ID: @.***>

agramfort avatar Jan 31 '22 13:01 agramfort

Have you considered using preload-as-string to use memmapping to save memory?

larsoner avatar Jan 31 '22 15:01 larsoner

So @agramfort I have slightly modified an existing benchmarking example from Hubert/braindecode to use dummy data here: https://gist.github.com/robintibor/2ca756e832367a3759a08418d5e649c5

Is this going in direction you thought? Regarding memmapping, not sure if this would solve our scenario 2, where we may be loading very large datasets from disk, and where we would not want to duplicate them.

robintibor avatar Feb 14 '22 15:02 robintibor

where we may be loading very large datasets from disk, and where we would not want to duplicate them.

Why is duplicating them (on disk as a memmap) a problem in practice? Limitations on storage space or something?

larsoner avatar Feb 14 '22 18:02 larsoner

Yes we may look at very large datasets like >1TB

robintibor avatar Feb 28 '22 14:02 robintibor