Mark Litwintschik
Mark Litwintschik
This the code I ran. ```python from itertools import chain, islice from multiprocessing import Pool import socket import struct import sys import uuid from tldextract import extract as tld_ex ip2int...
That's a good question. I haven't run it through a flame graph or anything, I guess that would be a good place to start. If it were the regex causing...
I've managed to produce a [flame graph](https://github.com/john-kurkowski/tldextract/files/3679710/perf.zip), GitHub demands the SVG is ZIP-compressed in order to upload. I'll leave my setup notes that I used to produce this graph here...
@kokes I ran the following which only imports Pandas and dbfread once and doesn't re-execute Python during each iteration. This wasn't run on Python 2.7.12 due to ``yield from`` being...
I suspect the reason multiple files per table aren't supported is because that feature has yet to arrive in datafusion https://github.com/apache/arrow-datafusion/issues/133 If this support does arrive do keep in mind...
Ubuntu for Windows uses Ubuntu 20.04 which only supports glibc up to 3.4.28 with its packaging system. `sudo apt-get upgrade libstdc++6` won't push that version up any further. ```python llm...
Copying the metadata wholesale wouldn't be accurate. Each image is offset from the original file's offset.
Is this issue still present in 0.7? I produced a PQ file in https://github.com/duckdb/duckdb/discussions/6478 with both DuckDB 0.7 and ClickHouse and they were within 10% of one another in size....
tqdm's API looks much like rich's. The issue is how to tie this into MP's aggregation calls. There is no iterator exposed that I could use to keep track of...
Is there anywhere deeper when records are iterated over one at a time? This could be a place to add a hook to a progress counter.