SpecForge icon indicating copy to clipboard operation
SpecForge copied to clipboard

[Question] Why d2t = [target_token_ids] - torch.arange(len)?

Open Tomorrowdawn opened this issue 2 months ago • 3 comments

https://github.com/sgl-project/SpecForge/blob/d3472dde5d6828e60e7ee766ded74754e5dc6778/specforge/data/preprocessing.py#L588

I find it extremely strange that d2t doesn't store the direct mapping of [target_token_ids], instead, it stores [target_token_ids] - torch.arange(len). What's the purpose of this offset?

Tomorrowdawn avatar Oct 31 '25 07:10 Tomorrowdawn

Not storing the mapping directly requires less space; it only requires O(target_vocab_size) + O(draft_vocab_size). Furthermore, these operations can be performed directly on tensors, resulting in high computational efficiency.

SpecForge/specforge/data/preprocessing.py

Line 588 in d3472dd

d2t = [used_tokens[i] - i for i in range(len(used_tokens))] I find it extremely strange that d2t doesn't store the direct mapping of [target_token_ids], instead, it stores [target_token_ids] - torch.arange(len). What's the purpose of this offset?

jiapingW avatar Nov 02 '25 02:11 jiapingW

Sorry, I don't understand what you mean. Storing the mapping directly is equivalent to remove the i for i in range(len(used_tokens)) term. So, it uses same memory. However, storing the mapping directly is more readable and less prone to misunderstanding. I found this quirk when i tried to make a visualization tool to check the distribution shifts. D2t is not actually draft to target, costing me about 3 hours to debug it.

Not storing the mapping directly requires less space; it only requires O(target_vocab_size) + O(draft_vocab_size). Furthermore, these operations can be performed directly on tensors, resulting in high computational efficiency.

SpecForge/specforge/data/preprocessing.py

Line 588 in d3472dd

d2t = [used_tokens[i] - i for i in range(len(used_tokens))] I find it extremely strange that d2t doesn't store the direct mapping of [target_token_ids], instead, it stores [target_token_ids] - torch.arange(len). What's the purpose of this offset?

Tomorrowdawn avatar Nov 02 '25 03:11 Tomorrowdawn

I understand what you mean. I don't think the -i operation here has any impact, but if you modify it, you'll also need to modify other places that call it, for example, https://github.com/SafeAILab/EAGLE/blob/main/eagle/model/cnets.py#L712.

jiapingW avatar Nov 02 '25 04:11 jiapingW