iotdb icon indicating copy to clipboard operation
iotdb copied to clipboard

[IOTDB-5792] Parallel encoding in MemTable flush

Open jt2594838 opened this issue 2 years ago • 5 comments

For the issue, please refer to: https://issues.apache.org/jira/browse/IOTDB-5792

For the design and evaluation, please refer to: https://apache-iotdb.feishu.cn/docx/TAu9dKrFioYQxmxhgUVcVrmmnWf

jt2594838 avatar Apr 21 '23 06:04 jt2594838

Fantastic job, only minor question, why not just using thread pool like this ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory) rather than recreate the DynamicThread?

neuyilan avatar May 12 '23 06:05 neuyilan

Fantastic job, only minor question, why not just using thread pool like this ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory) rather than recreate the DynamicThread?

The keep-alive mechanism in ThreadPoolExecutor is very primitive. It:

  1. destroys threads that are not active within the keepAliveTime, not concerning the idle ratio.
  2. adds new threads whenever there is a pending task and the thread number does not reach the maximum size, not concerning the idle ratio.
  3. does not apply to our situation, where each thread runs an infinite loop. As a result, the thread is always alive and will never be destroyed.

Based on such features, ThreadPoolExecutor cannot control the utility of threads at a finer level, which is the motivation we create DynamicThread.

To put it simply, DynamicThread provides two parameters private double maximumIdleRatio; private double minimumIdleRatio; which are not covered by ThreadPoolExecutor. Please have a look at them.

jt2594838 avatar May 15 '23 06:05 jt2594838

Excellent job, I noticed that your design documentation and code implementation both mentioned:

"The previous version flushes devices and timeseries on each device in the lex order. To guarantee the sameflushing, the task should be properly divided so that the final l0 order can be preserved, while sort tasks and encoding tasks can be fully parallel"

I'm not sure if the lex order is based on what design when flush data is written to disk, but if we don't follow this order, there may be performance improvements in DeviceIOTask,

Agreed. But I am not sure what side effect it may cause, so we may discuss about it in another issue.

jt2594838 avatar May 22 '23 01:05 jt2594838

Agreed. But I am not sure what side effect it may cause, so we may discuss about it in another issue.

OK~

neuyilan avatar May 22 '23 02:05 neuyilan