Mauricio Caceres Bravo comments

Results 205 comments of


                                            Mauricio Caceres Bravo

Performance improvement for transposing data

I think it can me multi-threaded but I don't think it actually improved performance when I tested it (though I might have not done it right). I think what might...

Performance improvement for transposing data

I think that the way to structure this that might be faster: 1. Read parquet file into Arrow table. Multi-threaded; should be fast. 2. Arrow table into Stata memory, looping...

Performance improvement for transposing data

I'm doing some benchmarking. Writing 10M rows and 3 variables once the data is in an arrow table takes 0.5s. Looping over Stata as it is atm also takes 0.5s....

Performance improvement for transposing data

I wonder if it's not multi-threaded. I would like to cut down processing time in half, ideally. I think that's plausible, but I doubt it can ever be faster than...

Performance improvement for transposing data

Nicely enough, it takes literally a third of the time (one col vs 3)

Performance improvement for transposing data

It sounds like individual columns can be chuncked. I think I can only implement the solution suggested in the apache docs if the number of chunks and each chunk size...

Performance improvement for transposing data

I think a row group is only relevant when reading the file from disk, not when iterating over the table already in memory.

Performance improvement for transposing data

I've been trying this out on the server on modestly large data that I've been using for a project (few GiB) and compression is amazing! Performance for traversing several variables...

Performance improvement for transposing data

Yup. Had this on the back of my head. Don't think it'd take too long. Format ideas? ``` Reading [### ] X% (obs i / N; group r / R)...

Performance improvement for transposing data

linesize is a problem ):