How to skip n lines from a CSV,TSV file?
I am often looking at Geo data files before processing. They usually have a (sometimes large) number of description lines before the data starts.
How can I skip n lines, or alternatively skip lines that starts with either a space (" ") or a (#)?
Here is an example:
- https://hpiers.obspm.fr/eoppc/series/opa/eopc04R_IAU2000_daily
EARTH ORIENTATION PARAMETER (EOP) PRODUCT CENTER CENTER (PARIS OBSERVATORY)
INTERNATIONAL EARTH ROTATION AND REFERENCE SYSTEMS SERVICE
EOP (IERS) 20 C04 TIME SERIES (old format)
Description: https://hpiers.obspm.fr/eoppc/eop/eopc04/eopc04.txt
contact: [email protected]
FORMAT(3(I4),I7,2(F11.6),2(F12.7),2(F11.6),2(F11.6),2(F11.7),2(F12.6))
##################################################################################
Date MJD x y UT1R-UTC LODR dX dY x Err y Err UT1R-UTC Err LODR Err dY Err dY Err
" " s s " " " " s s " "
(0h UTC)
1962 1 1 37665 -0.012700 0.213000 0.1349702 0.0016599 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 2 37666 -0.015900 0.214100 0.1344413 0.0016366 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 3 37667 -0.019000 0.215200 0.1339451 0.0016062 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 4 37668 -0.021999 0.216301 0.1334778 0.0015882 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 5 37669 -0.024799 0.217301 0.1330285 0.0015611 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 6 37670 -0.027599 0.218301 0.1325949 0.0015342 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 7 37671 -0.030199 0.219301 0.1321995 0.0015057 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 8 37672 -0.032798 0.220202 0.1318236 0.0014748 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 9 37673 -0.035198 0.221102 0.1314904 0.0014401 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 10 37674 -0.037498 0.222002 0.1311949 0.0014019 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 11 37675 -0.039697 0.222803 0.1309382 0.0013605 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 12 37676 -0.041797 0.223703 0.1307235 0.0013180 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 13 37677 -0.043797 0.224503 0.1305521 0.0012656 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 14 37678 -0.045697 0.225203 0.1304328 0.0012239 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 15 37679 -0.047496 0.226004 0.1303625 0.0011757 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 16 37680 -0.049196 0.226704 0.1303383 0.0011312 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 17 37681 -0.050796 0.227404 0.1303477 0.0010907 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 18 37682 -0.052295 0.228005 0.1303965 0.0010632 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 19 37683 -0.053595 0.228705 0.1304717 0.0010362 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 20 37684 -0.054895 0.229305 0.1305833 0.0009893 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 21 37685 -0.055995 0.229905 0.1307322 0.0009517 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 22 37686 -0.057094 0.230506 0.1309217 0.0009299 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 23 37687 -0.057994 0.231006 0.1311236 0.0009149 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
1962 1 24 37688 -0.058794 0.231606 0.1313429 0.0008965 0.000000 0.000000 0.030000 0.030000 0.0020000 0.0014000 0.004774 0.002000
BUMP
There is no way to do this currently. I recently added the comment_regex feature to RBQL, but it could take a while to integrate it into the extension because I am not sure yet how to do it nice from the UI/UX perspective.
@mechatroner For more foolproof/simplicity, I think it might be better to not allowing user to put their own REGEX via UI/UX, and keep it in the settings JSON (if really needed). Instead provide a very short drop down with selection of comment characters (from start of line).
For example (in my case above) I would select:
- [x]
^[ ]+(for lines starting with spaces.) - [x]
^#[ #]+(for lines starting with#. Like in Bash, Python files.) - [ ]
^//[ ]+(for lines starting with//. Like in JS, NodeJS, C/C++ files.) - [ ]
^'[ ]+(for lines starting with'. Like in BAT files.)
And then add field for skipping number of lines in file, perhaps in bottom bar for character set, spacing.
- Skip
[auto | NN]lines from beginning of file, whereautowould apply the comment lines according to above automatically, andNNwould skip NN number of lines of file, regardless content.
I agree that such a drop-down would be convenient for some use cases, but IMO not generic enough for many other formats.