mokapot
mokapot copied to clipboard
Chunksize should be derived from MEMORY env variable
The issue was started by this discussion regarding MSAID's streaming branch.
For streaming several chunk sizes are defined and currently hard-coded. It would be desirable to modify them. Best would be that the chunk sizes are derived automatically and optimally given the memory constraints of the user.
TODO:
- [ ] implement a function that given a small chunk of the PSM data can estimate the memory requirements given a chunk size. Probably, we need to estimate this dynamically as the number of feature columns in the Percolator format is dynamic.
- [ ] implement a mechanism that gets user requirements regarding memory usage from an environment variable (e.g.
MEMORY
) - [ ] At runtime check user memory requirements or fall back to some default, estimate maximum chunk sizes and set them.