dcache icon indicating copy to clipboard operation
dcache copied to clipboard

/etc/cron.daily/dcache fails with regexp error

Open calestyo opened this issue 2 years ago • 12 comments

Hey.

In 7.2.10 /usr/sbin/dcache-billing-indexer as invoked by /etc/cron.daily/dcache fails like that:

capturing group name does not start with a Latin letter near index 15
(?<date>.*?)(?<\t>.*?)\QsMsg:\E(?<\t>.*?)\Q[\E(?<cellType>pool)\Q:\E(?<cellNameXcell>.*?)\Q@\E(?<cellNameXdomain>.*?)\Q:\E(?<type>(?:re)?store)\Q]\E(?<\t>.*?)\Q[\E(?<session>.*?)\Q]\E(?<\t>.*?)\Q[\E(?<pnfsid>[0-9A-F]{24}(?:[0-9A-F]{12})?)\Q:\E(?<path>.*?)\Q]\E(?<\t>.*?)(?<filesize>-?\d+)\QB\E(?<\t>.*?)\Q[\E(?:(?<storageXstorageClass>.*?)\Q@\E(?<storageXhsm>.*?)|\Q<unknown>\E)\Q]\E(?<\t>.*?)\Q[\E(?<subjectXloginName>.*?)\Q]\E(?<\t>.*?)\Q[\E(?<subjectXdn>.*?)\Q]\E(?<\t>.*?)\Q[[\E(?<subjectXprimaryFqan>.*?)\Q]:[\E(?<subjectXfqans>.*?)\Q]]\E(?<\t>.*?)\Q[\E(?<subjectXuserName>.*?)\Q]\E(?<\t>.*?)\Q[\E(?<subjectXuid>.*?)\Q]\E(?<\t>.*?)\Q[\E(?<subjectXprimaryGid>.*?)\Q:\E(?<subjectXgids>.*?)\Q]\E(?<\t>.*?)(?<queuingTime>-?\d+)\Qms\E(?<\t>.*?)(?<transferTime>-?\d+)\Qms\E(?<\t>.*?)\Q[\E(?<rc>-?\d+)\Q:"\E(?<message>.*?)\Q"]\E
               ^

COMMANDS:
   -all [-fpp=PROP] [-dir=BASE]
          (Re)index all billing files.
   -compress FILE...
          Compress FILE.
   -decompress FILE...
          Decompress FILE.
   -find [-files|-json|-yaml] [-dir=BASE] [-since=DATE] [-until=DATE] [-f=FILE] [SEARCHTERM]...
          Output billing entries that contain SEARCHTERM. Valid search terms are
          path, pnfsid, dn and path prefixes of those. Optionally output names
          of billing files that might contain the search term. If no search term
          is provided, all entries are output.
   -index [-fpp=PROP] FILE...
          Create index for FILE.
   -yesterday [-compress] [-fpp=PROP] [-dir=BASE] [-flat=BOOL]
          Index yesterday's billing file. Optionally compresses the billing file
          after indexing it.

OPTIONS:
   -dir=BASE
          Base directory for billing files. Default is taken from dCache
          configuration.
   -flat=BOOLEAN
          Chooses between flat or hierarchical directory layout. Default is
          taken from dCache configuration.
   -fpp=PROP
          The false positive probability expressed as a value in (0;1]. The
          default is 0.01.

Cheers, Chris

calestyo avatar Feb 25 '22 16:02 calestyo

Hi Chris,

have you changes billing format?

$ ./dcache/sbin/dcache-billing-indexer -all 
Indexing /home/tigran/eProjects/dcache/packages/system-test/target/dcache/var/billing/2022/02/billing-2022.02.16
Indexing /home/tigran/eProjects/dcache/packages/system-test/target/dcache/var/billing/2022/02/billing-2022.02.17
$     

kofemann avatar Mar 01 '22 11:03 kofemann

Hey.

Yes I did,... but shouldn't it pick that up automatically?

Cheers, Chris

calestyo avatar Mar 01 '22 15:03 calestyo

There are two places: billing format and indexer format:

billing.text.format.xxx=
billing.parser.format

Did you have updated both?

kofemann avatar Mar 02 '22 08:03 kofemann

I have set the later to the former via:

billing.parser.format!door-request-info-message=${billing.text.format.door-request-info-message}
billing.parser.format!pool-hit-info-message=${billing.text.format.pool-hit-info-message}
billing.parser.format!storage-info-message=${billing.text.format.storage-info-message}
billing.parser.format!mover-info-message=${billing.text.format.mover-info-message}
billing.parser.format!remove-file-info-message=${billing.text.format.remove-file-info-message}
billing.parser.format!warning-pnfs-file-info-message=${billing.text.format.warning-pnfs-file-info-message}

calestyo avatar Mar 03 '22 03:03 calestyo

I guess that option way doesn't work :)

kofemann avatar Mar 03 '22 18:03 kofemann

Uhm? So it's a bug, or do I misuse it? :D

calestyo avatar Mar 05 '22 04:03 calestyo

I don't remember the details, however, if it was that simple to we haven't define it twice...

kofemann avatar Mar 06 '22 15:03 kofemann

Just to clarify -- in case there's some confusion.

The configuration properties that start (billing.text.format.) describe how new records are to be written. You should be able to update these configuration property values to customise what information is recorded, and how that information is represented.

FWIW, I think it's very unlikely we (dCache.org) will ever modify the default values. The risk is too great that we inadvertently break a parser written by some third-party.

Since dCache v3.0.0 (released ~November 2016), the billing files contain special comment lines that define the format used to the different record types. These are lines that start with a double-comment sequence (##). Parsers can use this to learn which format billing was configured to use when parsing lines.

The dcache-billing-indexer command should use these comments when parsing the file.

However, one might want to parse pre-3.0.0 billing files: those written before support for the double-comment sequence was introduced. The billing.parser.format family of configuration properties allows you to configure how the parser should understand the records if there are no lines starting ##.

If you're parsing lines written by dCache v3.0.0 or later, the billing.parser.format family of configuration properties should have no effect, as the billing files should be self-describing.

paulmillar avatar Mar 07 '22 11:03 paulmillar

Hey.

Since dCache v3.0.0 (released ~November 2016)

All our currently available billing files range back to 2021-01-01, where we already ran something way beyond 3.x ... and the header lines you mention are in place.

So what you describe shouldn't apply to us anyway, and thus the bug is likely somewhere else?

Cheers, Chris.

calestyo avatar Mar 09 '22 10:03 calestyo

Hi Chris,

My take: if changing the billing.parser.format family of configuration properties fixes a problem parsing billing files written post v3.0.0 then there's a bug somewhere.

So, did modifying billing.parser.format "fix" the problem?

It wasn't clear to me from your description.

Also, just to eliminate something: when you mentioned that you changed the format, you did this in dcache.conf or the layout file, right? You didn't edit the files in the /usr/share/dcache/defaults directory.

Cheers, Paul.

paulmillar avatar Mar 09 '22 10:03 paulmillar

Uhm... I'm a bit confused now ^^

So what I did was the following: I changed our previously set and already custom:

billing.text.format.mover-info-message=$date; format="${lmu.miscellaneous.date-time-format}"$$$$\\t$mMsg:$\\t$[$cellType$:$cellName.cell$@$cellName.domain$:$type$]$\\t$[$session$]$\\t$[$pnfsid$:$path$]$\\t$$$$filesize$B$\\t$[$if(storage)$$$$storage.storageClass$@$storage.hsm$$$$else$<unknown>$endif$]$\\t$[$subject.loginName$]$\\t$[$subject.dn$]$\\t$[[$subject.primaryFqan$]:[$subject.fqans; separator="|"$]]$\\t$[$subject.userName$]$\\t$[$subject.uid$]$\\t$[$subject.primaryGid$:$subject.gids; separator="|"$]$\\t$$$$queuingTime$ms$\\t$[$protocol$]$\\t$[$initiator$]$\\t$$$$if(p2p)$p2p$else$no-p2p$endif$$$$\\t$$$$if(created)$upload$else$download$endif$$$$\\t$$$$transferred$B$\\t$$$$meanReadBandwidth$MiB/s$\\t$$$$meanWriteBandwidth$MiB/s$\\t$$$$connectionTime$ms$\\t$$$$readActive$ms$\\t$$$$readIdle$ms$\\t$$$$writeActive$ms$\\t$$$$writeIdle$ms$\\t$[$transferPath$]$\\t$[$rc$:"$message$"]

to:

billing.text.format.mover-info-message=$date; format="${lmu.miscellaneous.date-time-format}"$$$$\\t$mMsg:$\\t$[$cellType$:$cellName.cell$@$cellName.domain$:$type$]$\\t$[$session$]$\\t$[$pnfsid$:$path$]$\\t$$$$filesize$B$\\t$[$if(storage)$$$$storage.storageClass$@$storage.hsm$$$$else$<unknown>$endif$]$\\t$[$subject.loginName$]$\\t$[$subject.dn$]$\\t$[[$subject.primaryFqan$]:[$subject.fqans; separator="|"$]]$\\t$[$subject.userName$]$\\t$[$subject.uid$]$\\t$[$subject.primaryGid$:$subject.gids; separator="|"$]$\\t$$$$queuingTime$ms$\\t$[$protocol$]$\\t$[$initiator$]$\\t$$$$if(p2p)$p2p$else$no-p2p$endif$$$$\\t$$$$if(created)$upload$else$download$endif$$$$\\t$$$$transferred$B$\\t$$$$meanReadBandwidth$B/s$\\t$$$$meanWriteBandwidth$B/s$\\t$$$$connectionTime$ms$\\t$$$$readActive$ms$\\t$$$$readIdle$ms$\\t$$$$writeActive$ms$\\t$$$$writeIdle$ms$\\t$[$transferPath$]$\\t$[$rc$:"$message$"]

The only difference being the string literal B changed to MiB,... there was a documentation error in dCache and that was fixed a while ago so I adapted that.

Admittedly, I only started checking the cron mails recently... so it might very well be, that even the old setting gave already errors.

So, did modifying billing.parser.format "fix" the problem?

What exactly do you mean with "modifying"? Literally setting the value to billing.parser.format!* instead of via variable assignment? As in:

billing.parser.format!door-request-info-message=$date; format="${lmu.miscellaneous.date-time-format}"$$$$\\t$drMsg:$\\t$[$cellType$:$cellName.cell$@$cellName.domain$:$type$]$\\t$[$session$]$\\t$[$pnfsid$:$path$]$\\t$$$$filesize$B$\\t$[$if(storage)$$$$storage.storageClass$@$storage.hsm$$$$else$<unknown>$endif$]$\\t$[$subject.loginName$]$\\t$[$subject.dn$]$\\t$[[$subject.primaryFqan$]:[$subject.fqans; separator="|"$]]$\\t$[$subject.userName$]$\\t$[$subject.uid$]$\\t$[$subject.primaryGid$:$subject.gids; separator="|"$]$\\t$$$$queuingTime$ms$\\t$[$clientChain$]$\\t$$$$transactionTime$ms$\\t$[$transferPath$]$\\t$[$rc$:"$message$"]
billing.parser.format!pool-hit-info-message=$date; format="${lmu.miscellaneous.date-time-format}"$$$$\\t$phMsg:$\\t$[$cellType$:$cellName.cell$@$cellName.domain$:$type$]$\\t$[$session$]$\\t$[$pnfsid$:$path$]$\\t$$$$filesize$B$\\t$[$if(storage)$$$$storage.storageClass$@$storage.hsm$$$$else$<unknown>$endif$]$\\t$[$subject.loginName$]$\\t$[$subject.dn$]$\\t$[[$subject.primaryFqan$]:[$subject.fqans; separator="|"$]]$\\t$[$subject.userName$]$\\t$[$subject.uid$]$\\t$[$subject.primaryGid$:$subject.gids; separator="|"$]$\\t$$$$queuingTime$ms$\\t$[$protocol$]$\\t$$$$if(cached)$cached$else$not-cached$endif$$$$\\t$[$transferPath$]$\\t$[$rc$:"$message$"]
billing.parser.format!storage-info-message=$date; format="${lmu.miscellaneous.date-time-format}"$$$$\\t$sMsg:$\\t$[$cellType$:$cellName.cell$@$cellName.domain$:$type$]$\\t$[$session$]$\\t$[$pnfsid$:$path$]$\\t$$$$filesize$B$\\t$[$if(storage)$$$$storage.storageClass$@$storage.hsm$$$$else$<unknown>$endif$]$\\t$[$subject.loginName$]$\\t$[$subject.dn$]$\\t$[[$subject.primaryFqan$]:[$subject.fqans; separator="|"$]]$\\t$[$subject.userName$]$\\t$[$subject.uid$]$\\t$[$subject.primaryGid$:$subject.gids; separator="|"$]$\\t$$$$queuingTime$ms$\\t$$$$transferTime$ms$\\t$[$rc$:"$message$"]
billing.parser.format!mover-info-message=$date; format="${lmu.miscellaneous.date-time-format}"$$$$\\t$mMsg:$\\t$[$cellType$:$cellName.cell$@$cellName.domain$:$type$]$\\t$[$session$]$\\t$[$pnfsid$:$path$]$\\t$$$$filesize$B$\\t$[$if(storage)$$$$storage.storageClass$@$storage.hsm$$$$else$<unknown>$endif$]$\\t$[$subject.loginName$]$\\t$[$subject.dn$]$\\t$[[$subject.primaryFqan$]:[$subject.fqans; separator="|"$]]$\\t$[$subject.userName$]$\\t$[$subject.uid$]$\\t$[$subject.primaryGid$:$subject.gids; separator="|"$]$\\t$$$$queuingTime$ms$\\t$[$protocol$]$\\t$[$initiator$]$\\t$$$$if(p2p)$p2p$else$no-p2p$endif$$$$\\t$$$$if(created)$upload$else$download$endif$$$$\\t$$$$transferred$B$\\t$$$$meanReadBandwidth$B/s$\\t$$$$meanWriteBandwidth$B/s$\\t$$$$connectionTime$ms$\\t$$$$readActive$ms$\\t$$$$readIdle$ms$\\t$$$$writeActive$ms$\\t$$$$writeIdle$ms$\\t$[$transferPath$]$\\t$[$rc$:"$message$"]
billing.parser.format!remove-file-info-message=$date; format="${lmu.miscellaneous.date-time-format}"$$$$\\t$rfMsg:$\\t$[$cellType$:$cellName.cell$@$cellName.domain$:$type$]$\\t$[$session$]$\\t$[$pnfsid$:$path$]$\\t$$$$filesize$B$\\t$[$if(storage)$$$$storage.storageClass$@$storage.hsm$$$$else$<unknown>$endif$]$\\t$[$subject.loginName$]$\\t$[$subject.dn$]$\\t$[[$subject.primaryFqan$]:[$subject.fqans; separator="|"$]]$\\t$[$subject.userName$]$\\t$[$subject.uid$]$\\t$[$subject.primaryGid$:$subject.gids; separator="|"$]$\\t$$$$queuingTime$ms$\\t$[$rc$:"$message$"]
billing.parser.format!warning-pnfs-file-info-message=$date; format="${lmu.miscellaneous.date-time-format}"$$$$\\t$wpfMsg:$\\t$[$cellType$:$cellName.cell$@$cellName.domain$:$type$]$\\t$[$session$]$\\t$[$pnfsid$:$path$]$\\t$$$$filesize$B$\\t$[$if(storage)$$$$storage.storageClass$@$storage.hsm$$$$else$<unknown>$endif$]$\\t$[$subject.loginName$]$\\t$[$subject.dn$]$\\t$[[$subject.primaryFqan$]:[$subject.fqans; separator="|"$]]$\\t$[$subject.userName$]$\\t$[$subject.uid$]$\\t$[$subject.primaryGid$:$subject.gids; separator="|"$]$\\t$$$$queuingTime$ms$\\t$[$transferPath$]$\\t$[$rc$:"$message$"]

?

Just tried that, and still leads to:

# /etc/cron.daily/dcache
capturing group name does not start with a Latin letter near index 15
(?<date>.*?)(?<\t>.*?)\QsMsg:\E(?<\t>.*?)\Q[\E(?<cellType>pool)\Q:\E(?<cellNameXcell>.*?)\Q@\E(?<cellNameXdomain>.*?)\Q:\E(?<type>(?:re)?store)\Q]\E(?<\t>.*?)\Q[\E(?<session>.*?)\Q]\E(?<\t>.*?)\Q[\E(?<pnfsid>[0-9A-F]{24}(?:[0-9A-F]{12})?)\Q:\E(?<path>.*?)\Q]\E(?<\t>.*?)(?<filesize>-?\d+)\QB\E(?<\t>.*?)\Q[\E(?:(?<storageXstorageClass>.*?)\Q@\E(?<storageXhsm>.*?)|\Q<unknown>\E)\Q]\E(?<\t>.*?)\Q[\E(?<subjectXloginName>.*?)\Q]\E(?<\t>.*?)\Q[\E(?<subjectXdn>.*?)\Q]\E(?<\t>.*?)\Q[[\E(?<subjectXprimaryFqan>.*?)\Q]:[\E(?<subjectXfqans>.*?)\Q]]\E(?<\t>.*?)\Q[\E(?<subjectXuserName>.*?)\Q]\E(?<\t>.*?)\Q[\E(?<subjectXuid>.*?)\Q]\E(?<\t>.*?)\Q[\E(?<subjectXprimaryGid>.*?)\Q:\E(?<subjectXgids>.*?)\Q]\E(?<\t>.*?)(?<queuingTime>-?\d+)\Qms\E(?<\t>.*?)(?<transferTime>-?\d+)\Qms\E(?<\t>.*?)\Q[\E(?<rc>-?\d+)\Q:"\E(?<message>.*?)\Q"]\E
               ^

COMMANDS:
   -all [-fpp=PROP] [-dir=BASE]
          (Re)index all billing files.
   -compress FILE...
          Compress FILE.
   -decompress FILE...
          Decompress FILE.
   -find [-files|-json|-yaml] [-dir=BASE] [-since=DATE] [-until=DATE] [-f=FILE] [SEARCHTERM]...
          Output billing entries that contain SEARCHTERM. Valid search terms are
          path, pnfsid, dn and path prefixes of those. Optionally output names
          of billing files that might contain the search term. If no search term
          is provided, all entries are output.
   -index [-fpp=PROP] FILE...
          Create index for FILE.
   -yesterday [-compress] [-fpp=PROP] [-dir=BASE] [-flat=BOOL]
          Index yesterday's billing file. Optionally compresses the billing file
          after indexing it.

OPTIONS:
   -dir=BASE
          Base directory for billing files. Default is taken from dCache
          configuration.
   -flat=BOOLEAN
          Chooses between flat or hierarchical directory layout. Default is
          taken from dCache configuration.
   -fpp=PROP
          The false positive probability expressed as a value in (0;1]. The
          default is 0.01.

And I did that in dcache.conf.

I do in fact modify some of the defaults files (because of the long standing issue #3309), too, but that should be completely unrelated.

Cheers, Chris.

calestyo avatar Mar 10 '22 16:03 calestyo

Anything new on this? Still fails 9.1.2, and it's quite clearly an issue in how the regexp is generated: ?<\t> is not a valid capture group name.

calestyo avatar Nov 10 '23 15:11 calestyo