Alex
Alex
perhaps related to https://github.com/huggingface/datasets/issues/6833
@ArtsiomWB I'd also like to request this feature, I assume it isn't yet implemented?
well more or less exactly as above: ``` wandb.define_metric("accuracy", summary="best", goal="maximize") wandb.define_metric("fraction_of_dogs_found", summary="value_when_accuracy_best") wandb.define_metric("fraction_of_cats_found", summary="value_when_accuracy_best") ```
I think the problem is that the underlying ex_iterable will not use iter_arrow unless the formatting type is arrow, which leads to conversion from arrow -> python -> numpy in...
also now working for filter with similar performance improvements: ```python filtered_examples = [] ds = dataset.to_iterable_dataset() ds = ds.with_format("numpy").filter(lambda x: [arr.shape[0]==2000 for arr in x["array0"]], batch_size=10, batched=True) t0 = time.time()...
There also appears to be a separate? issue with chaining filter and map bc filter iter_arrow only returns _iter_arrow if arrow formatting is applied (and vv presumably) I don't have...
> I feel like we could get rid of TypedExampleIterable altogether and apply formatting with feature conversion with formatted_python_examples_iterator and formatted_arrow_examples_iterator Oh nice didn't know about the feature support in...
ok i've fixed the chaining issue with my last two commits. Will see if I can refactor into a FormattedExampleIterable The other issue you posted seems to be unrelated (maybe...
updated with FormattedExamplesIterable. there might be a few unnecessary format calls once the data is already formatted - doesn't seem like a big performance bottleneck but could maybe be fixed...
Thinking about this in the context of #7210 - am wondering if it would make sense for Features to define their own extraction arrow->object logic? e.g. Arrays should *always* be...