mokapot icon indicating copy to clipboard operation
mokapot copied to clipboard

Chunksize should be derived from MEMORY env variable

Open gessulat opened this issue 6 months ago • 0 comments

The issue was started by this discussion regarding MSAID's streaming branch.

For streaming several chunk sizes are defined and currently hard-coded. It would be desirable to modify them. Best would be that the chunk sizes are derived automatically and optimally given the memory constraints of the user.

TODO:

  • [ ] implement a function that given a small chunk of the PSM data can estimate the memory requirements given a chunk size. Probably, we need to estimate this dynamically as the number of feature columns in the Percolator format is dynamic.
  • [ ] implement a mechanism that gets user requirements regarding memory usage from an environment variable (e.g. MEMORY)
  • [ ] At runtime check user memory requirements or fall back to some default, estimate maximum chunk sizes and set them.

gessulat avatar Aug 01 '24 11:08 gessulat