Alex comments

Results 31 comments of


                                            Alex

Slow iteration over Torch tensors

perhaps related to https://github.com/huggingface/datasets/issues/6833

Validate with async response loop

@ArtsiomWB I'd also like to request this feature, I assume it isn't yet implemented?

Validate with async response loop

well more or less exactly as above: ``` wandb.define_metric("accuracy", summary="best", goal="maximize") wandb.define_metric("fraction_of_dogs_found", summary="value_when_accuracy_best") wandb.define_metric("fraction_of_cats_found", summary="value_when_accuracy_best") ```

apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets

I think the problem is that the underlying ex_iterable will not use iter_arrow unless the formatting type is arrow, which leads to conversion from arrow -> python -> numpy in...

apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets

also now working for filter with similar performance improvements: ```python filtered_examples = [] ds = dataset.to_iterable_dataset() ds = ds.with_format("numpy").filter(lambda x: [arr.shape[0]==2000 for arr in x["array0"]], batch_size=10, batched=True) t0 = time.time()...

apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets

There also appears to be a separate? issue with chaining filter and map bc filter iter_arrow only returns _iter_arrow if arrow formatting is applied (and vv presumably) I don't have...

apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets

> I feel like we could get rid of TypedExampleIterable altogether and apply formatting with feature conversion with formatted_python_examples_iterator and formatted_arrow_examples_iterator Oh nice didn't know about the feature support in...

apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets

ok i've fixed the chaining issue with my last two commits. Will see if I can refactor into a FormattedExampleIterable The other issue you posted seems to be unrelated (maybe...

apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets

updated with FormattedExamplesIterable. there might be a few unnecessary format calls once the data is already formatted - doesn't seem like a big performance bottleneck but could maybe be fixed...

apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets

Thinking about this in the context of #7210 - am wondering if it would make sense for Features to define their own extraction arrow->object logic? e.g. Arrays should *always* be...