polars
polars copied to clipboard
feat(python): Add `read_clipboard` and `DataFrame.write_clipboard`
Closes #9902
Similar to pandas.DataFrame.to_clipboard and pandas.read_clipboard, read_clipboard reads csv format text from clipboard and convert it into DataFrame whereas DataFrame.write_clipboard writes DataFrame into clipboard with csv format.
These 2 methods are useful while exploring data from Excel or other similar software in interactive environment like jupyter notebook.
Implementation
- Reading and writing clipboard uses arboard in rust side, supporting Windows, macOS and Linux, maintained by 1Password.
read_clipboardjust reads the clipboard then passes the result toread_csvwith default separator'\t'.write_clipboardjust gets csv string by callingDataFrame.write_csv(but with default separator'\t') and writes to clipboard.
Difference from pandas method
pandas.read_clipboard uses regex '\\s+' as separator, but polars does not allow regex as separator in read_csv.
Compatibility
Tested Compatible:
- Microsoft Excel (Mac)
- Google Sheet (Online)
- WPS (Online)
Others
Both methods are only tested on M1 macOS. It would be much appreciated if someone could test on Linux and Windows.
Codecov Report
Attention: Patch coverage is 42.50000% with 23 lines in your changes are missing coverage. Please review.
Project coverage is 81.34%. Comparing base (
252702a) to head (56002cc). Report is 14 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| py-polars/src/functions/io.rs | 0.00% | 18 Missing :warning: |
| py-polars/polars/io/clipboard.py | 72.72% | 3 Missing :warning: |
| py-polars/polars/dataframe/frame.py | 50.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #15272 +/- ##
==========================================
+ Coverage 81.32% 81.34% +0.01%
==========================================
Files 1359 1365 +6
Lines 176076 176655 +579
Branches 2526 2526
==========================================
+ Hits 143191 143694 +503
- Misses 32402 32478 +76
Partials 483 483
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I am not convinced by this one. Two things I am concerned about.
- New dependencies and binary size. (Can you check how much this increments by the new dependencies)
- Why default to tab separated data? What do we think about limited dtype support?
I am not convinced by this one. Two things I am concerned about.
- New dependencies and binary size. (Can you check how much this increments by the new dependencies)
- Why default to tab separated data? What do we think about limited dtype support?
I'm not kinda familiar with this. I hope I did the right thing. I use my mac and make build-release in ./py-polars; and then I check the binary size of py-polars/polars/polars.abi3.so
- Before (252702a): 69.3 MB (69,303,536 Bytes)
- After (56002cc): 69.3 MB (69,306,240 Bytes)
- Difference: +2.704 KB
About default separator, while copying data in Excel, Google Sheet or WPS, the data will be transformed into csv with separator \t. Since these 2 methods are majorly used for exploring data in these softwares, \t is used as default. Similarly, pandas also does the same thing.
- Why default to tab separated data?
Adding to @CanglongCl 's, rendered html table is another source of clipboard I/O which defaults to tab separator. "I/O of tsv strings via clipboard" is the force of nature, I think.
Alright. Let's give this a try. I understand why this can be nice. Thank you @CanglongCl