Mark Litwintschik comments

Results 14 comments of


                                            Mark Litwintschik

How can I do 1.2 billion extractions faster?

This the code I ran. ```python from itertools import chain, islice from multiprocessing import Pool import socket import struct import sys import uuid from tldextract import extract as tld_ex ip2int...

How can I do 1.2 billion extractions faster?

That's a good question. I haven't run it through a flame graph or anything, I guess that would be a good place to start. If it were the regex causing...

How can I do 1.2 billion extractions faster?

I've managed to produce a [flame graph](https://github.com/john-kurkowski/tldextract/files/3679710/perf.zip), GitHub demands the SVG is ZIP-compressed in order to upload. I'll leave my setup notes that I used to produce this graph here...

How can I dump DBFs to CSVs faster?

@kokes I ran the following which only imports Pandas and dbfread once and doesn't re-execute Python during each iteration. This wasn't run on Python 2.7.12 due to ``yield from`` being...

Selecting files with Glob pattern / regexp when registering a table

I suspect the reason multiple files per table aren't supported is because that feature has yet to arrive in datafusion https://github.com/apache/arrow-datafusion/issues/133 If this support does arrive do keep in mind...

Issue: Could you please package the GPT4All using a lower version of GLIBCXX?

Ubuntu for Windows uses Ubuntu 20.04 which only supports glibc up to 3.4.28 with its packaging system. `sudo apt-get upgrade libstdc++6` won't push that version up any further. ```python llm...

GeoTIFF metadata lost when splitting a GeoTIFF up into tiles

Copying the metadata wholesale wouldn't be accurate. Each image is offset from the original file's offset.

Too large parquet files via "COPY TO"

Is this issue still present in 0.7? I produced a PQ file in https://github.com/duckdb/duckdb/discussions/6478 with both DuckDB 0.7 and ClickHouse and they were within 10% of one another in size....

Reporting progress of get_significant_points_gdf

tqdm's API looks much like rich's. The issue is how to tie this into MP's aggregation calls. There is no iterator exposed that I could use to keep track of...

Reporting progress of get_significant_points_gdf

Is there anywhere deeper when records are iterated over one at a time? This could be a place to add a hook to a progress counter.