Thijs

Results 416 comments of Thijs

I would also vote for removing `duckdb.default_connection` entirely in favor of `duckdb.connect(':default:')` Perhaps `set_default_connection` could be `DuckDBPyConnection::set_as_default` instead?

As mentioned [here](https://duckdblabs.com/news/2023/10/02/support-policy.html), the Julia client does fall under our support policy, so this issue will likely not be picked up by the team. You're welcome to provide a pull...

Looking at the regressions tests, the old version seems to be faster (by a very thin margin) ?? It might just be noise, but *all* of them are very slightly...

I can't reproduce the test failures, I'm aware that pandas causes a RuntimeWarning here, but I've suppressed those in the test (and made an issue upstream). I don't know why...

I think the regression is not just spurious this time, extra conversion has to be done but I don't think there's a way around that. Because we create numpy arrays...

@Mytherin do we perhaps want to make this a setting that is opt-in? To not affect the base cost for conversion from duckdb->pandas ?

```py import numpy as np import numpy.ma import pandas as pd # Takes 2 seconds for the script to run #columns = { # 'a': [29348234234] * 10_000_000 #} #...

Thanks 👍 That helped, creating a Series is *very* expensive, even when this is the data: ```py # SIZE = 100_000_000 data = [29348234234] * SIZE mask = [True] *...

measuring the time it takes to create `pd.DataFrame.from_dict()` for 100_000_000 tuples from regular np.array: `0.09` from masked np.ma.masked_array (all true) `0.275` from masked np.ma.masked_array (all false) `0.20` from np.ma.masked_array (all...

That is indeed much faster: all true: ``` ➜ duckdb git:(pandas_dtype_correction) ✗ python3 tmp/int64_test.py Series 0.33167219161987305 DataFrame 0.11038994789123535 ``` all false: ``` ➜ duckdb git:(pandas_dtype_correction) ✗ python3 tmp/int64_test.py Series 0.28539395332336426...