Artem Kozhevnikov issues

Results 6 issues of


Artem Kozhevnikov

Dynamic bucketing

**Is your feature request related to a problem? Please describe:** Samples participating in data pipeline can possess different characteristics (like length, number of tokens, ...). Currently we can only bucket...

enhancement

using pipeline_builder shared pointer multiple times lead to segfaults

**Describe the bug:** Segfault during the pipeline creation **Describe how to reproduce:** ```python from fairseq2.data import read_sequence from fairseq2.data.data_pipeline import DataPipeline, DataPipelineBuilder pipeline_build = read_sequence(list(range(100))) # this's shared for two...

bug

DataPipeline execution profiling

How could one to see the overall performance iterations of a complex DataPipeline and identify potential bottlenecks ? Ideally we would like to have several time series informations (like CPU/Memory...

question

operation priority in DataPipeline operations

**Is your feature request related to a problem? Please describe:** When there several steps in a complex data pipeline we would like to have a possibility to manually more priority...

enhancement

`from_generator` function for BuilderDataPipeline

**Is your feature request related to a problem? Please describe:** Currently there's only `read_sequence` method that requires to know all list of sample in advance, `from_generator` would offer more flexible...

enhancement

controllable `WaveformToFbankConverter` multithreading

**Describe the bug:** `WaveformToFbankConverter` is running in multithread parallel. This method (as possibly some others) uses [`parallel_for`](https://github.com/facebookresearch/fairseq2/blob/main/fairseq2n/src/fairseq2n/data/audio/detail/kaldi_fbank.cc#L94) statement for the execution. Currently, there's no obvious way to control the number...

bug