synthetic icon indicating copy to clipboard operation
synthetic copied to clipboard

R package for dataset generation and benchmarking

Results 12 synthetic issues
Sort by recently updated
recently updated
newest added

One of the biggest selling points for me of [fst](https://github.com/fstpackage/fst) is the ability to randomly access (and append in [v0.9.2](https://github.com/fstpackage/fst/milestone/23)!) disk stored data. That is, load specific data into an...

enhancement

A numeric vector can have a limited amount of _levels_ that are replicated: ```r # 10 'levels' vec_levels

enhancement

Advanced feature to generate dataset samples from a source dataset with the correlations between column vectors retained: ``` r dt

enhancement

And generate new sample data from that. Correlations between columns can be retained: ``` r dt

enhancement

Using appropriate generators and the `synthetic` infrastructure

enhancement

For example: ``` r synthetic_bench() %>% bench_tables(generator, column_mode = "single column") %>% bench_streamers(rds_streamer, fst_streamer, parguet_streamer, feather_streamer) %>% bench_rows(1e7, 5e7) %>% bench_compression(50, 80) %>% compute() ``` Parameter _column\_mode_ could specify the...

enhancement

Although the arrow package doesn't directly support selection of the number of threads (I think)

enhancement

See [here](https://arrow.apache.org/blog/)

enhancement

As it also tracks memory allocations. It would be nice to benchmark memory usage across packages as well.

enhancement

For long benchmarks, the user should not use data when an error is encountered (or the system goed down). Instead, we can use a file to save temp results and...

enhancement