htmltab
htmltab copied to clipboard
Request: compare lines within two CSV files, difference in new CSV
Hello,
First of all, great work!
I have a feature request (maybe not really in the scope of your project?). Compare two outputs (CSV files) and print lines that differ in a new CSV file. Reason is, I pull a table from an URL, table consists of approximately 2000 lines. Each day a line can be added, but now I have to check the complete file/lines again, to check the differences.
Hope this is clear, otherwise please ask.
Many thanks and keep up the good work.
This is definitely outside the scope of this project, but fortunately this is something you can already do by combining HTMLTab with existing tools. Let's say that yesterday you requested data from https://example.com/data.csv
and saved it to a local file named yesterday.csv
. Today you can see the differences using:
sdiff --suppress-common-lines yesterday.csv <(htmltab https://example.com/data.csv)
That will use HTMLTab to get the latest version of the CSV and pass it in as the second argument to sdiff
(side-by-side diff), comparing it to yesterday.csv
and only outputting the lines that have changed. I know that sdiff
is available on MacOS and Linux by default but I'm not sure about Windows.
This is actually a really good use-case for HTMLTab, I'll add this to the documentation as an example at some point.
Thanks! Checked it, works like a charm.
Is there also a simple way to filter lines within the file which match a specific string?
For example the table on bottom of your page https://flother.github.io/htmltab/, when you want to filter within colom "W" for the lines that matches string "12".
Original table: P Team GP W D L F A GD Pts 1 Man City 21 17 2 2 53 13 40 53 2 Chelsea 21 12 7 2 45 16 29 43 3 Liverpool 20 12 6 2 52 18 34 42 4 Arsenal 20 11 2 7 33 25 8 35 5 West Ham 20 10 4 6 37 27 10 34 6 Spurs 18 10 3 5 23 20 3 33
The result should be: P Team GP W D L F A GD Pts 2 Chelsea 21 12 7 2 45 16 29 43 3 Liverpool 20 12 6 2 52 18 34 42
Thanks in advance!
Yep, that's possible too. The simplest way is to pipe the output through the standard Unix tool grep:
$ htmltab https://www.theguardian.com/football/premierleague/table | grep Leicester
1,Man City,22,18,2,2,54,13,41,56,Won against Newcastle Won against Leicester Won against Brentford Won against Arsenal Won against Chelsea
2,Liverpool,21,13,6,2,55,18,37,45,Won against Newcastle Drew with Spurs Lost to Leicester Drew with Chelsea Won against Brentford
5,Spurs,19,11,3,5,26,22,4,36,Drew with Liverpool Won against C Palace Drew with Southampton Won against Watford Won against Leicester
10,Leicester,19,7,4,8,33,36,-3,25,Lost to Aston Villa Won against Newcastle Lost to Man City Won against Liverpool Lost to Spurs
19,Newcastle,20,1,9,10,20,43,-23,12,Lost to Leicester Lost to Liverpool Lost to Man City Drew with Man Utd Drew with Watford
That's string matching within the whole file though, not individual columns. If you're feeling adventurous you should try the excellent xsv. That will allow you to search in particular columns:
$ htmltab https://www.theguardian.com/football/premierleague/table | xsv search --select Team Leicester
P,Team,GP,W,D,L,F,A,GD,Pts,Form
10,Leicester,19,7,4,8,33,36,-3,25,Lost to Aston Villa Won against Newcastle Lost to Man City Won against Liverpool Lost to Spurs
You can use xsv to remove columns and format the result:
$ htmltab https://www.theguardian.com/football/premierleague/table | xsv search --select Team Leicester | xsv select '!Form' | xsv table
P Team GP W D L F A GD Pts
10 Leicester 19 7 4 8 33 36 -3 25
Works great, many thanks for your support, really helps a lot!
You can close this request for now.
Have a nice weekend!
Glad I could help.
I'll keep this issue open as a reminder to add the example to the documentation. Once I've done that, I'll close it.