arche
arche copied to clipboard
Replace DataFrames's default `_repr_html_` (closes #76)
~I added a subclass for DataFrame, in order to override its to_html()
, allowing us to define some defaults styling, like the clickable URLs from #76 .~
I changed my approach on this feature. The way I've tried previously doesn't work on new DataFrames
s created by common pandas functions (eg. df.head()
df[df['url'].notna()]
).
Now I'm replacing the default's _repr_html_()
method from DataFrames
.
Let me know your thougths on this one.
Codecov Report
Merging #175 into master will increase coverage by
0.2%
. The diff coverage is92.85%
.
@@ Coverage Diff @@
## master #175 +/- ##
=========================================
+ Coverage 81% 81.21% +0.2%
=========================================
Files 24 25 +1
Lines 1606 1634 +28
Branches 279 281 +2
=========================================
+ Hits 1301 1327 +26
- Misses 251 252 +1
- Partials 54 55 +1
Impacted Files | Coverage Δ | |
---|---|---|
src/arche/__init__.py | 100% <100%> (ø) |
:arrow_up: |
src/arche/tools/dataframe.py | 91.66% <91.66%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 476a9dd...0655121. Read the comment docs.
Here is a simple benchmark I did for measuring the runtime. Below is the script I used. It loads a dataset of 100K+ items and prints its HTML representation. I forced it to display 100_000 lines instead of truncate them.
# render_links_benchmark.py
import time
import arche
import pandas as pd
df = pd.read_json("./327565_39_252_items.jl", lines=True)
with pd.option_context("display.min_rows", 100_000, "display.max_rows", 100_000):
t = time.process_time()
out = df._repr_html_()
print(f"Time expended on `_repr_html_`: {time.process_time() - t}")
print(f"Len: {len(out)}")
Executing it:
$ for branch in master clickable_urls; do git checkout $branch; ./render_links_benchmark.py; done
Already on 'master'
Time expended on `_repr_html_`: 164.999099792
Len: 276061183
Switched to branch 'clickable_urls'
Time expended on `_repr_html_`: 208.39284144
Len: 322665630
It turns out that, for that dataset, rendering links generates about 17% more data and takes about 27% longer to complete.