srobertjames comments

Results 19 comments of


                                            srobertjames

scanf on windows is not hooked because vscanf is not implemented

Is there a workaround for this? msvc, instead of linking to scanf, seems to emit a simple wrapper around vfscanf, including that wrapper in the binary (as if it was...

Windows vfscanf not implemented: api __stdio_common_vfscanf is not implemented

See https://docs.microsoft.com/en-us/previous-versions/dn727675(v=vs.140) that `__stdio_common_vfscanf` is "used to implement the CRT".

[Feature Request] Running a Linux binary without supplying a new VM or OS

I'll add as an alternative: `firejail` works this way, but has much less security, and I believe worse performance, than firecracker.

[Feature Request] Running a Linux binary without supplying a new VM or OS

This all makes sense. Would it be possible to include a sample script to do that? This would be very useful for many, and would help those new to firecracker...

Dataset.map gets stuck on _cast_to_python_objects

Are you able to reproduce this? My example is small enough that it should be easy to try.

Dataset.map gets stuck on _cast_to_python_objects

Wow, adding `return_tensors="np"` sped up my example by a **factor 17x** of and completely eliminated the casting! I'd recommend not only to document it, but to make that the default....

Dataset.map gets stuck on _cast_to_python_objects

@lhoestq I just benchmarked the two edits to `features.py` above, and they appear to solve the problem, bringing my original example to within 20% the speed of the output `"np"`...

Improve torch formatting performance

Is time spent casting an issue here? See https://github.com/huggingface/datasets/issues/4676 that Datasets can spend huge amounts of time repeatedly casting to Python objects.

Datasets.map causes incorrect overflow_to_sample_mapping when used with tokenizers and small batch size

I've built a minimal example that shows this bug without `n_proc`. It seems like it's a problem any way of using **tokenizers, `overflow_to_sample_mapping`, and Dataset.map, with a small batch size**:...

Datasets.map causes incorrect overflow_to_sample_mapping when used with tokenizers and small batch size

A larger batch size does _not_ have this behavior: ``` def tok2(d): return tok(d['question'], d['context']) ds = datasets.Dataset.from_dict({'question': questions, 'context': contexts}) tokens = ds.map(tok2, batched=True, batch_size=2) print(tokens['overflow_to_sample_mapping']) assert tokens['overflow_to_sample_mapping'] ==...