vscode_rainbow_csv icon indicating copy to clipboard operation
vscode_rainbow_csv copied to clipboard

How to skip n lines from a CSV,TSV file?

Open eabase opened this issue 4 months ago • 4 comments

I am often looking at Geo data files before processing. They usually have a (sometimes large) number of description lines before the data starts.

How can I skip n lines, or alternatively skip lines that starts with either a space (" ") or a (#)?

Here is an example:

  • https://hpiers.obspm.fr/eoppc/series/opa/eopc04R_IAU2000_daily
               EARTH ORIENTATION PARAMETER (EOP) PRODUCT CENTER CENTER (PARIS OBSERVATORY)
                      INTERNATIONAL EARTH ROTATION AND REFERENCE SYSTEMS SERVICE
                                    EOP (IERS) 20 C04 TIME SERIES  (old format)   
               Description: https://hpiers.obspm.fr/eoppc/eop/eopc04/eopc04.txt
                            contact: [email protected]

  
             FORMAT(3(I4),I7,2(F11.6),2(F12.7),2(F11.6),2(F11.6),2(F11.7),2(F12.6))
##################################################################################
  
      Date      MJD      x          y        UT1R-UTC      LODR        dX        dY        x Err     y Err  UT1R-UTC Err  LODR Err    dY Err       dY Err  
                         "          "           s           s          "         "           "          "          s         s            "           "
     (0h UTC)

1962   1   1  37665  -0.012700   0.213000   0.1349702   0.0016599   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1   2  37666  -0.015900   0.214100   0.1344413   0.0016366   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1   3  37667  -0.019000   0.215200   0.1339451   0.0016062   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1   4  37668  -0.021999   0.216301   0.1334778   0.0015882   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1   5  37669  -0.024799   0.217301   0.1330285   0.0015611   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1   6  37670  -0.027599   0.218301   0.1325949   0.0015342   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1   7  37671  -0.030199   0.219301   0.1321995   0.0015057   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1   8  37672  -0.032798   0.220202   0.1318236   0.0014748   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1   9  37673  -0.035198   0.221102   0.1314904   0.0014401   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  10  37674  -0.037498   0.222002   0.1311949   0.0014019   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  11  37675  -0.039697   0.222803   0.1309382   0.0013605   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  12  37676  -0.041797   0.223703   0.1307235   0.0013180   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  13  37677  -0.043797   0.224503   0.1305521   0.0012656   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  14  37678  -0.045697   0.225203   0.1304328   0.0012239   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  15  37679  -0.047496   0.226004   0.1303625   0.0011757   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  16  37680  -0.049196   0.226704   0.1303383   0.0011312   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  17  37681  -0.050796   0.227404   0.1303477   0.0010907   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  18  37682  -0.052295   0.228005   0.1303965   0.0010632   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  19  37683  -0.053595   0.228705   0.1304717   0.0010362   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  20  37684  -0.054895   0.229305   0.1305833   0.0009893   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  21  37685  -0.055995   0.229905   0.1307322   0.0009517   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  22  37686  -0.057094   0.230506   0.1309217   0.0009299   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  23  37687  -0.057994   0.231006   0.1311236   0.0009149   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000
1962   1  24  37688  -0.058794   0.231606   0.1313429   0.0008965   0.000000   0.000000   0.030000   0.030000  0.0020000  0.0014000    0.004774    0.002000

eabase avatar Aug 06 '25 23:08 eabase

BUMP

eabase avatar Aug 17 '25 23:08 eabase

There is no way to do this currently. I recently added the comment_regex feature to RBQL, but it could take a while to integrate it into the extension because I am not sure yet how to do it nice from the UI/UX perspective.

mechatroner avatar Sep 28 '25 04:09 mechatroner

@mechatroner For more foolproof/simplicity, I think it might be better to not allowing user to put their own REGEX via UI/UX, and keep it in the settings JSON (if really needed). Instead provide a very short drop down with selection of comment characters (from start of line).

For example (in my case above) I would select:

  • [x] ^[ ]+ (for lines starting with spaces.)
  • [x] ^#[ #]+ (for lines starting with #. Like in Bash, Python files.)
  • [ ] ^//[ ]+ (for lines starting with //. Like in JS, NodeJS, C/C++ files.)
  • [ ] ^'[ ]+ (for lines starting with '. Like in BAT files.)

And then add field for skipping number of lines in file, perhaps in bottom bar for character set, spacing.

  • Skip [auto | NN] lines from beginning of file, where auto would apply the comment lines according to above automatically, and NN would skip NN number of lines of file, regardless content.

eabase avatar Sep 29 '25 12:09 eabase

I agree that such a drop-down would be convenient for some use cases, but IMO not generic enough for many other formats.

mechatroner avatar Sep 30 '25 00:09 mechatroner