sglang
sglang copied to clipboard
【Fixed】fixed the tensor created bug when within torch.device() scope
Motivation
The tensor is created within the scope of the torch module: with torch. device(self.device), so all tensors are placed on this device. But the CPU tensor will be placed on the GPU, resulting in a mismatch. Such as seq_lens_cpu, extend_seq_lens_cpu and extend_logprob_start_lens_cpu, these tensors need to be placed on the CPU originally.
Modifications
For those tensors that need to be placed on the CPU, I explicitly added the parameter device='cpu'.
with torch.device(self.device):
forward_batch = ForwardBatch(
forward_mode=ForwardMode.EXTEND,
batch_size=bs,
input_ids=input_ids,
input_embeds=input_embeds,
req_pool_indices=torch.arange(bs, device=self.device),
seq_lens=torch.tensor([num_tokens], device=self.device),
next_token_logits_buffer=None,
orig_seq_lens=torch.tensor([num_tokens], device=self.device),
seq_lens_cpu=torch.tensor([num_tokens], device='cpu'),
req_to_token_pool=self.model_runner.req_to_token_pool,
token_to_kv_pool=self.model_runner.token_to_kv_pool,
attn_backend=self.model_runner.attn_backend,
out_cache_loc=out_cache_loc,
out_cache_loc_swa=out_cache_loc_swa,
seq_lens_sum=num_tokens,
encoder_lens=None,
return_logprob=False,
extend_num_tokens=num_tokens,
extend_seq_lens=torch.tensor([num_tokens], device=self.device),
extend_prefix_lens=torch.tensor([num_tokens], device=self.device),
extend_start_loc=torch.tensor([0], device=self.device),
extend_prefix_lens_cpu=torch.tensor(
[num_tokens], device='cpu'),
extend_seq_lens_cpu=torch.tensor([num_tokens], device='cpu'),
extend_logprob_start_lens_cpu=torch.tensor(
[num_tokens], device='cpu'),
positions=positions,
global_num_tokens_gpu=None,
global_num_tokens_for_logprob_gpu=None,
dp_padding_mode=DpPaddingMode.get_default_mode_in_cuda_graph(),
global_dp_buffer_len=None,
mrope_positions=mrope_positions,
spec_algorithm=None,
spec_info=None,
capture_hidden_mode=CaptureHiddenMode.NULL,
num_token_non_padded=None,
global_forward_mode=ForwardMode.EXTEND,
lora_ids=None,
)
[!WARNING] You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!
Good job. Who is on piece-wise cuda graph?
Please format code with pre-commit run --all-files .
Please format code with
pre-commit run --all-files.
Done
@DavidChan0519 Curious, when was this bug introduced and in what cases we will find errors?
@DavidChan0519 Curious, when was this bug introduced and in what cases we will find errors?
I am not clear. In my situation, it did crash when I used the seq_lens_cpu tensor.