htmltab Request: compare lines within two CSV files, difference in new CSV

Hello,

First of all, great work!

I have a feature request (maybe not really in the scope of your project?). Compare two outputs (CSV files) and print lines that differ in a new CSV file. Reason is, I pull a table from an URL, table consists of approximately 2000 lines. Each day a line can be added, but now I have to check the complete file/lines again, to check the differences.

Hope this is clear, otherwise please ask.

Many thanks and keep up the good work.

Jan 19 '22 21:01 hspjanssen

This is definitely outside the scope of this project, but fortunately this is something you can already do by combining HTMLTab with existing tools. Let's say that yesterday you requested data from https://example.com/data.csv and saved it to a local file named yesterday.csv. Today you can see the differences using:

sdiff --suppress-common-lines yesterday.csv <(htmltab https://example.com/data.csv)

That will use HTMLTab to get the latest version of the CSV and pass it in as the second argument to sdiff (side-by-side diff), comparing it to yesterday.csv and only outputting the lines that have changed. I know that sdiff is available on MacOS and Linux by default but I'm not sure about Windows.

This is actually a really good use-case for HTMLTab, I'll add this to the documentation as an example at some point.

Jan 20 '22 09:01 flother

Thanks! Checked it, works like a charm.

Is there also a simple way to filter lines within the file which match a specific string?

For example the table on bottom of your page https://flother.github.io/htmltab/, when you want to filter within colom "W" for the lines that matches string "12".

Original table: P Team GP W D L F A GD Pts 1 Man City 21 17 2 2 53 13 40 53 2 Chelsea 21 12 7 2 45 16 29 43 3 Liverpool 20 12 6 2 52 18 34 42 4 Arsenal 20 11 2 7 33 25 8 35 5 West Ham 20 10 4 6 37 27 10 34 6 Spurs 18 10 3 5 23 20 3 33

The result should be: P Team GP W D L F A GD Pts 2 Chelsea 21 12 7 2 45 16 29 43 3 Liverpool 20 12 6 2 52 18 34 42

Thanks in advance!

Jan 20 '22 16:01 hspjanssen

Yep, that's possible too. The simplest way is to pipe the output through the standard Unix tool grep:

$ htmltab https://www.theguardian.com/football/premierleague/table | grep Leicester
1,Man City,22,18,2,2,54,13,41,56,Won against Newcastle Won against Leicester Won against Brentford Won against Arsenal Won against Chelsea
2,Liverpool,21,13,6,2,55,18,37,45,Won against Newcastle Drew with Spurs Lost to Leicester Drew with Chelsea Won against Brentford
5,Spurs,19,11,3,5,26,22,4,36,Drew with Liverpool Won against C Palace Drew with Southampton Won against Watford Won against Leicester
10,Leicester,19,7,4,8,33,36,-3,25,Lost to Aston Villa Won against Newcastle Lost to Man City Won against Liverpool Lost to Spurs
19,Newcastle,20,1,9,10,20,43,-23,12,Lost to Leicester Lost to Liverpool Lost to Man City Drew with Man Utd Drew with Watford

That's string matching within the whole file though, not individual columns. If you're feeling adventurous you should try the excellent xsv. That will allow you to search in particular columns:

$ htmltab https://www.theguardian.com/football/premierleague/table | xsv search --select Team Leicester
P,Team,GP,W,D,L,F,A,GD,Pts,Form
10,Leicester,19,7,4,8,33,36,-3,25,Lost to Aston Villa Won against Newcastle Lost to Man City Won against Liverpool Lost to Spurs

You can use xsv to remove columns and format the result:

$ htmltab https://www.theguardian.com/football/premierleague/table | xsv search --select Team Leicester | xsv select '!Form' | xsv table
P   Team       GP  W   D   L   F   A   GD  Pts
10  Leicester  19  7   4   8   33  36  -3  25

Jan 21 '22 17:01 flother

Works great, many thanks for your support, really helps a lot!

You can close this request for now.

Have a nice weekend!

Jan 21 '22 19:01 hspjanssen

Glad I could help.

I'll keep this issue open as a reminder to add the example to the documentation. Once I've done that, I'll close it.

Jan 21 '22 21:01 flother

htmltab htmltab copied to clipboard

Request: compare lines within two CSV files, difference in new CSV

htmltab
htmltab copied to clipboard