sglang 【Fixed】fixed the tensor created bug when within torch.device() scope

Motivation

The tensor is created within the scope of the torch module: with torch. device(self.device), so all tensors are placed on this device. But the CPU tensor will be placed on the GPU, resulting in a mismatch. Such as seq_lens_cpu, extend_seq_lens_cpu and extend_logprob_start_lens_cpu, these tensors need to be placed on the CPU originally.

Modifications

For those tensors that need to be placed on the CPU, I explicitly added the parameter device='cpu'.

        with torch.device(self.device):
            forward_batch = ForwardBatch(
                forward_mode=ForwardMode.EXTEND,
                batch_size=bs,
                input_ids=input_ids,
                input_embeds=input_embeds,
                req_pool_indices=torch.arange(bs, device=self.device),
                seq_lens=torch.tensor([num_tokens], device=self.device),
                next_token_logits_buffer=None,
                orig_seq_lens=torch.tensor([num_tokens], device=self.device),
                seq_lens_cpu=torch.tensor([num_tokens], device='cpu'),
                req_to_token_pool=self.model_runner.req_to_token_pool,
                token_to_kv_pool=self.model_runner.token_to_kv_pool,
                attn_backend=self.model_runner.attn_backend,
                out_cache_loc=out_cache_loc,
                out_cache_loc_swa=out_cache_loc_swa,
                seq_lens_sum=num_tokens,
                encoder_lens=None,
                return_logprob=False,
                extend_num_tokens=num_tokens,
                extend_seq_lens=torch.tensor([num_tokens], device=self.device),
                extend_prefix_lens=torch.tensor([num_tokens], device=self.device),
                extend_start_loc=torch.tensor([0], device=self.device),
                extend_prefix_lens_cpu=torch.tensor(
                    [num_tokens], device='cpu'),
                extend_seq_lens_cpu=torch.tensor([num_tokens], device='cpu'),
                extend_logprob_start_lens_cpu=torch.tensor(
                    [num_tokens], device='cpu'),
                positions=positions,
                global_num_tokens_gpu=None,
                global_num_tokens_for_logprob_gpu=None,
                dp_padding_mode=DpPaddingMode.get_default_mode_in_cuda_graph(),
                global_dp_buffer_len=None,
                mrope_positions=mrope_positions,
                spec_algorithm=None,
                spec_info=None,
                capture_hidden_mode=CaptureHiddenMode.NULL,
                num_token_non_padded=None,
                global_forward_mode=ForwardMode.EXTEND,
                lora_ids=None,
            )