Mauricio Caceres Bravo

Results 205 comments of Mauricio Caceres Bravo

I think it can me multi-threaded but I don't think it actually improved performance when I tested it (though I might have not done it right). I think what might...

I think that the way to structure this that might be faster: 1. Read parquet file into Arrow table. Multi-threaded; should be fast. 2. Arrow table into Stata memory, looping...

I'm doing some benchmarking. Writing 10M rows and 3 variables once the data is in an arrow table takes 0.5s. Looping over Stata as it is atm also takes 0.5s....

I wonder if it's not multi-threaded. I would like to cut down processing time in half, ideally. I think that's plausible, but I doubt it can ever be faster than...

Nicely enough, it takes literally a third of the time (one col vs 3)

It sounds like individual columns can be chuncked. I think I can only implement the solution suggested in the apache docs if the number of chunks and each chunk size...

I think a row group is only relevant when reading the file from disk, not when iterating over the table already in memory.

I've been trying this out on the server on modestly large data that I've been using for a project (few GiB) and compression is amazing! Performance for traversing several variables...

Yup. Had this on the back of my head. Don't think it'd take too long. Format ideas? ``` Reading [### ] X% (obs i / N; group r / R)...