polars icon indicating copy to clipboard operation
polars copied to clipboard

feat(python): Add `read_clipboard` and `DataFrame.write_clipboard`

Open CanglongCl opened this issue 1 year ago • 1 comments

Closes #9902

Similar to pandas.DataFrame.to_clipboard and pandas.read_clipboard, read_clipboard reads csv format text from clipboard and convert it into DataFrame whereas DataFrame.write_clipboard writes DataFrame into clipboard with csv format.

These 2 methods are useful while exploring data from Excel or other similar software in interactive environment like jupyter notebook.

Implementation

  • Reading and writing clipboard uses arboard in rust side, supporting Windows, macOS and Linux, maintained by 1Password.
  • read_clipboard just reads the clipboard then passes the result to read_csv with default separator '\t'.
  • write_clipboard just gets csv string by calling DataFrame.write_csv (but with default separator '\t') and writes to clipboard.

Difference from pandas method

pandas.read_clipboard uses regex '\\s+' as separator, but polars does not allow regex as separator in read_csv.

Compatibility

Tested Compatible:

  • Microsoft Excel (Mac)
  • Google Sheet (Online)
  • WPS (Online)

Others

Both methods are only tested on M1 macOS. It would be much appreciated if someone could test on Linux and Windows.

CanglongCl avatar Mar 25 '24 07:03 CanglongCl

Codecov Report

Attention: Patch coverage is 42.50000% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 81.34%. Comparing base (252702a) to head (56002cc). Report is 14 commits behind head on main.

Files Patch % Lines
py-polars/src/functions/io.rs 0.00% 18 Missing :warning:
py-polars/polars/io/clipboard.py 72.72% 3 Missing :warning:
py-polars/polars/dataframe/frame.py 50.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15272      +/-   ##
==========================================
+ Coverage   81.32%   81.34%   +0.01%     
==========================================
  Files        1359     1365       +6     
  Lines      176076   176655     +579     
  Branches     2526     2526              
==========================================
+ Hits       143191   143694     +503     
- Misses      32402    32478      +76     
  Partials      483      483              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 25 '24 08:03 codecov[bot]

I am not convinced by this one. Two things I am concerned about.

  • New dependencies and binary size. (Can you check how much this increments by the new dependencies)
  • Why default to tab separated data? What do we think about limited dtype support?

ritchie46 avatar Mar 26 '24 14:03 ritchie46

I am not convinced by this one. Two things I am concerned about.

  • New dependencies and binary size. (Can you check how much this increments by the new dependencies)
  • Why default to tab separated data? What do we think about limited dtype support?

I'm not kinda familiar with this. I hope I did the right thing. I use my mac and make build-release in ./py-polars; and then I check the binary size of py-polars/polars/polars.abi3.so

  • Before (252702a): 69.3 MB (69,303,536 Bytes)
  • After (56002cc): 69.3 MB (69,306,240 Bytes)
  • Difference: +2.704 KB

About default separator, while copying data in Excel, Google Sheet or WPS, the data will be transformed into csv with separator \t. Since these 2 methods are majorly used for exploring data in these softwares, \t is used as default. Similarly, pandas also does the same thing.

CanglongCl avatar Mar 27 '24 02:03 CanglongCl

  • Why default to tab separated data?

Adding to @CanglongCl 's, rendered html table is another source of clipboard I/O which defaults to tab separator. "I/O of tsv strings via clipboard" is the force of nature, I think.

cjackal avatar Mar 27 '24 14:03 cjackal

Alright. Let's give this a try. I understand why this can be nice. Thank you @CanglongCl

ritchie46 avatar Apr 01 '24 12:04 ritchie46