dsbulk icon indicating copy to clipboard operation
dsbulk copied to clipboard

Count/unload a range of tokens: start token, end token

Open smatvienko-tb opened this issue 4 years ago • 1 comments

Hi guys! For big workloads that take hours (days), it is good to be able to count/unload a range of tokens. The aim is to split a huge task into smaller pieces, fit in maintain time-window, etc. Another benefit is to be able to resume failed (for any reason) upload from some point (a token that found in upload-errors.log)

SELECT * FROM thingsboard.ts_kv_cf WHERE token(entity_type, entity_id, key, partition) > :start AND token(entity_type, entity_id, key, partition) <= :end start: 2725990092663290223 end: 2748111336664437991

The parameters may look similar with nodetool repair syntax: [(-st start_token | --start-token start_token)] [(-et end_token | --end-token end_token)]

┆Issue is synchronized with this Jira Task by Unito

smatvienko-tb avatar Nov 04 '21 10:11 smatvienko-tb

I agree that this would be very useful, but unfortunately it is not implemented yet.

However there is a simple workaround: use the -query parameter and provide the entire query, including a WHERE clause, e.g.:

dsbulk unload -query 'SELECT ... FROM ... WHERE token(...) > 2725990092663290223 AND token(...) <= 2748111336664437991'

This is usually easily scriptable.

adutra avatar Jan 17 '22 21:01 adutra