sglang icon indicating copy to clipboard operation
sglang copied to clipboard

【Fixed】fixed the tensor created bug when within torch.device() scope

Open DavidChan0519 opened this issue 1 week ago • 6 comments

Motivation

The tensor is created within the scope of the torch module: with torch. device(self.device), so all tensors are placed on this device. But the CPU tensor will be placed on the GPU, resulting in a mismatch. Such as seq_lens_cpu, extend_seq_lens_cpu and extend_logprob_start_lens_cpu, these tensors need to be placed on the CPU originally.

Modifications

For those tensors that need to be placed on the CPU, I explicitly added the parameter device='cpu'.

        with torch.device(self.device):
            forward_batch = ForwardBatch(
                forward_mode=ForwardMode.EXTEND,
                batch_size=bs,
                input_ids=input_ids,
                input_embeds=input_embeds,
                req_pool_indices=torch.arange(bs, device=self.device),
                seq_lens=torch.tensor([num_tokens], device=self.device),
                next_token_logits_buffer=None,
                orig_seq_lens=torch.tensor([num_tokens], device=self.device),
                seq_lens_cpu=torch.tensor([num_tokens], device='cpu'),
                req_to_token_pool=self.model_runner.req_to_token_pool,
                token_to_kv_pool=self.model_runner.token_to_kv_pool,
                attn_backend=self.model_runner.attn_backend,
                out_cache_loc=out_cache_loc,
                out_cache_loc_swa=out_cache_loc_swa,
                seq_lens_sum=num_tokens,
                encoder_lens=None,
                return_logprob=False,
                extend_num_tokens=num_tokens,
                extend_seq_lens=torch.tensor([num_tokens], device=self.device),
                extend_prefix_lens=torch.tensor([num_tokens], device=self.device),
                extend_start_loc=torch.tensor([0], device=self.device),
                extend_prefix_lens_cpu=torch.tensor(
                    [num_tokens], device='cpu'),
                extend_seq_lens_cpu=torch.tensor([num_tokens], device='cpu'),
                extend_logprob_start_lens_cpu=torch.tensor(
                    [num_tokens], device='cpu'),
                positions=positions,
                global_num_tokens_gpu=None,
                global_num_tokens_for_logprob_gpu=None,
                dp_padding_mode=DpPaddingMode.get_default_mode_in_cuda_graph(),
                global_dp_buffer_len=None,
                mrope_positions=mrope_positions,
                spec_algorithm=None,
                spec_info=None,
                capture_hidden_mode=CaptureHiddenMode.NULL,
                num_token_non_padded=None,
                global_forward_mode=ForwardMode.EXTEND,
                lora_ids=None,
            )

DavidChan0519 avatar Nov 20 '25 06:11 DavidChan0519

[!WARNING] You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist[bot] avatar Nov 20 '25 06:11 gemini-code-assist[bot]

wechat_2025-11-20_144849_280

DavidChan0519 avatar Nov 20 '25 06:11 DavidChan0519

Good job. Who is on piece-wise cuda graph?

zhaochenyang20 avatar Nov 21 '25 02:11 zhaochenyang20

Please format code with pre-commit run --all-files .

BBuf avatar Nov 21 '25 02:11 BBuf

Please format code with pre-commit run --all-files .

Done

DavidChan0519 avatar Nov 22 '25 16:11 DavidChan0519

@DavidChan0519 Curious, when was this bug introduced and in what cases we will find errors?

hnyls2002 avatar Nov 22 '25 17:11 hnyls2002

@DavidChan0519 Curious, when was this bug introduced and in what cases we will find errors?

I am not clear. In my situation, it did crash when I used the seq_lens_cpu tensor.

DavidChan0519 avatar Nov 23 '25 15:11 DavidChan0519