Count/unload a range of tokens: start token, end token
Hi guys! For big workloads that take hours (days), it is good to be able to count/unload a range of tokens. The aim is to split a huge task into smaller pieces, fit in maintain time-window, etc. Another benefit is to be able to resume failed (for any reason) upload from some point (a token that found in upload-errors.log)
SELECT * FROM thingsboard.ts_kv_cf WHERE token(entity_type, entity_id, key, partition) > :start AND token(entity_type, entity_id, key, partition) <= :end
start: 2725990092663290223
end: 2748111336664437991
The parameters may look similar with nodetool repair syntax:
[(-st start_token | --start-token start_token)]
[(-et end_token | --end-token end_token)]
I agree that this would be very useful, but unfortunately it is not implemented yet.
However there is a simple workaround: use the -query parameter and provide the entire query, including a WHERE clause, e.g.:
dsbulk unload -query 'SELECT ... FROM ... WHERE token(...) > 2725990092663290223 AND token(...) <= 2748111336664437991'
This is usually easily scriptable.