leofs
leofs copied to clipboard
[leo_object_storage][optimize] Calc MD5 during waiting for disk I/O
Problem
here is a discussion. https://groups.google.com/d/msg/leoproject_leofs/QsxY2wNWhYg/SnnyPZAbCQAJ
TL;DR
-
the current: md5 after reading the whole block in one process
- pros
- code readability
- less GC
- cons
- wait for disk I/O
- pros
-
divide the whole read operations into small reads(expect OS/device level prefetch
- pros
- less GC
- cons
- increase the interaction between an erlang process and runtime
- pros
-
reader, calculator: implement reader(do disk I/O) and calculator(do md5) on concurrency primitives
- pros
- good CPU utilization
- cons
- get code a little bit complex
- more GC
- pros
-
implement disc I/O controller highly optimized for AVS using NIF
- pros
- good CPU utilization
- avoid GC
- any syscall can be used like readahead
- cons
- get code too much complex
- pros
TODO
- [ ] Implemet prototype in reader, calculator way
- [ ] bench under several scenarios
- [ ] try to come up with the another way
Throughout the project there are many possibilities to exploit parallelism. Interleaving read, send, calculation and etc. it would be great if we can come up with a framework to achieve it.
Now is more like to parallelizing at object level which we can push down to task level.
@windkit yes generalizing parallelism into framework is the best if possible but ironically getting parallelism right(pull the maximum performance) may be difficult on high level languages with built-in GC|Concurrency support including Erlang/Golang/Elixir than low|raw level ones like C/C++/Rust as we must pay cost(copying memory or pointer to that and GC later) for transferring the data between light weight processes.