peggyxpxu issues

Results 9 issues of


                                            peggyxpxu

BUG IN data filter

In old tools/remove_longshortdata.py, the feature length is in seconds, min_output_input_ratio is 0.05 and max_output_input_ratio in 10 as default. In new uio code, we have 100 frames every second, so min_output_input_ratio...

A bug in experiments/models/singlelayer.py

when I run evaluation.py using singlelayer , A bug happened in experiments/models/singlelayer.py of thie line: "x = merge(layers, mode='concat', concat_axis=channel_axis)" the x is 2D now,how can concat it use concat_axis=1...

Question：Why are the prompts for training and inference for audio event classification are different?

Hi,sir: I find the prompts for training and testing for audio event classification are different in the code. In the train task ”cla_label”, one example of the question is "Identify...

question

Eval code error

when I use eval code 'eval_esc.py' [https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/eval/eval_esc.py](url) The following error occurs： ``` from stats import calculate_stats ImportError: cannot import name 'calculate_stats' from 'stats' (/home/aipf/work/miniconda3/envs/venv_ltu_as/lib/python3.10/site-packages/stats.py) ``` when I use eval code...

bug

reproduction

why pad_or_trim use 1000 rather than 3000 when transcribe_audio?

why pad_or_trim use 1000 rather than 3000 when transcribe_audio? `mel = pad_or_trim(mel, 1000).to(model.device).to(dtype)`

question

Where use whisper decoder?

Hi,sir: 'Whisper Decoder' wav mentioned in ltu-as paper Fig.1. But I don't see whisper decoder being used anywhere. Could you please explain why? Thank you!

Modifications to the llama model

Hi,sir: You mentioned “only output text related states ”on line 734 in [modeling_llama.py,](https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/hf-dev/transformers-main/src/transformers/models/llama/modeling_llama.py). And only use text states in the next processing. On line 733 you did the same thing，only...

Question: LLaMA-7B LLM

Is LLaMA-7B LLM the original model or Llama-2-Chat model optimized for dialogue use cases?

question

question about q-former

Hi sir: If i want to use q-former for projector in acc audiocaps, the length of the audio encoder Placeholder should set to 64?