ooe1123 comments

Results 118 comments of


                                            ooe1123

ADD BARK

[fine.onnx] ○ bark/generation.py ``` def generate_fine( ... ): ... with _inference_mode(): ... for n in tqdm.tqdm(range(n_loops), disable=silent): ... for nn in range(n_coarse, N_FINE_CODEBOOKS): logits = model(nn, in_buffer) ``` ↓ ```...

ADD kotoba-whisper-v1.0

〇 transformers/models/whisper/modeling_whisper.py ``` class WhisperSdpaAttention(WhisperAttention): ... def forward( self, ... ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]: ... if ( is_cross_attention and past_key_value is not None and past_key_value[0].shape[2] == key_value_states.shape[1] ): ......

ADD kotoba-whisper-v1.0

〇 transformers/generation/utils.py ``` class GenerationMixin: ... def _greedy_search( ... ) -> Union[GenerateNonBeamOutput, torch.LongTensor]: ... while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): # prepare model inputs model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) # forward pass to...

ADD kotoba-whisper-v1.0

opset=17でエクスポートした場合、以下のエラーが発生するので、その対応 ``` onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from kotoba-whisper-v1.0_decoder.onnx failed:Type Error: Type parameter (T) of Optype (LayerNormalization) bound to different types (tensor(float) and tensor(float16) in node...

ADD Detect-Utility-Poles

tensorflow1ベース最新のPythonではtensorflow1をサポートしていないので、tensorflow1での環境構築は難易度が高い。 tensorflow2では、以下のエラーが解消できない。 ``` Traceback (most recent call last): File "/workspaces/dev/Detect-Utility-Poles/cli.py", line 44, in model = modellib.MaskRCNN(mode="inference", config=config, model_dir=LOGS_DIR) File "/workspaces/dev/Detect-Utility-Poles/Mask_RCNN/mrcnn/model.py", line 1844, in __init__ self.keras_model = self.build(mode=mode, config=config) File...

Implement ReazonSpeech2

@kyakuno Whisperの場合、audioデータをチャンクサイズに分けてチャンクごとに推論をしていたかと思うのですが、ReasonSpeechは一気に処理しているようです。なので、audioサイズが大きければ、推論のコストも大きくなります。セグメントごとに逐次結果を出すには、入力でなにかしらの工夫が必要そうです。

ADD sdxl-turbo

ONNXエクスポート optimum-cli export onnx --model stabilityai/sdxl-turbo --task stable-diffusion-xl onnx/

ADD japanese-reranker-cross-encoder-large-v1

``` class Exp(nn.Module): def __init__(self, model): super().__init__() self.model = model self.activation = Sigmoid() def forward(self, input_ids, attention_mask, token_type_ids): inputs = { "input_ids": input_ids, "attention_mask": attention_mask, "token_type_ids": token_type_ids, } logits =...

ADD llava

llava-v1.5-7b.onnxエクスポート〇 transformers/models/llama/modeling_llama.py ``` class LlamaModel(LlamaPreTrainedModel): ... def forward( ... ) -> Union[Tuple, BaseModelOutputWithPast]: ... # retrieve input_ids and inputs_embeds if input_ids is not None and inputs_embeds is not None:...

ADD llava

encode_imagesエクスポート〇 LLaVA/llava/model/llava_arch.py ``` class LlavaMetaForCausalLM(ABC): ... def prepare_inputs_labels_for_multimodal( ... ): ... if type(images) is list or images.ndim == 5: ... else: image_features = self.encode_images(images) ``` ↓ ``` class Exp(nn.Module):...