bibtool icon indicating copy to clipboard operation
bibtool copied to clipboard

Confusing example of rules in key generation

Open kirk86 opened this issue 5 years ago • 19 comments

Hi, I was wondering if you could elaborate a bit on the example on key generation rules shown in pages 59-60 in the manual.

I've tried that on a bib file that I have and my confusion is based on the part that each key is prepended with the word Article/Misc/Book etc, concatenated by the author and everything else follows.

I was wondering how can we remove from in that example the type such as Article/Book/Etc prepended on each key?

A second question that I have has to do with deleting fields following a certain pattern. Suppose I have a field named X-followed-by-other-random-stuff, and I want to delete all such fields.

So far I've tried

delete.field {X-"*"}
rewrite.rule {"^X-*$"}

But none of those has worked. Any ideas?

kirk86 avatar Jan 17 '20 22:01 kirk86

Ad 1.

The example tries to give a complete example of a more complex scheme. Maybe you want to study the bits and pieces on the preceeding pages and synthesize a pattern according to your needs.

In this example the specification part %3s($type) is replaced with the type of the entry. Delete the two occurrences and the type will not show up.

Ad 2.

BibTool does not contain functionality to match and rewrite the field names. Only the content of fields can be processed in this way.

I had a similar problem in the past. I have solved it by using BibTool to normalize the file and perform the rewriting with a separate program (Perl in my case). The idea is to get the fields in a single line by specifying a huge line line length in BibTool. Afterwards it is an easy task to identify and manipulate the fields with regular expression matching and replacement. Finally I have used BibTool again to transform the formatting back to "normal".

ge-ne avatar Jan 17 '20 22:01 ge-ne

BibTool does not contain functionality to match and rewrite the field names. Only the content of fields can be processed in this way.

If I understand correctly the manual says that delete.field == rewrite.rule? No? If so how can delete.field{priority} work and delete.field{priority-".*"} not?

So in other words does delete.field check the contents of the field and based on the output of that makes the decision to delete the filed or not?

kirk86 avatar Jan 17 '20 23:01 kirk86

I don't know where you have got the impression that delete.field and rewrite.rule are related. They are not!

delete.field is treated in section A.12.2. There is no indication that the field can be partial or a regular expression. The argument is compared as complete string and case-insensitive against the field names present. Nothing else.

May I direct your attention to keep.field. Instead of saying what you want to delete you can specify what to keep.

ge-ne avatar Jan 17 '20 23:01 ge-ne

I don't know where you have got the impression that delete.field and rewrite.rule are related. They are not!

Page 64:

If no replacement text is given then the whole field is deleted. In fact the instruction
delete.field is only an alias for a corresponding rewrite rule with an empty replacement
text.

This bit threw mew off.

Reading above it clarifies this but can easily be overlooked

This pattern is matched
against sub-strings of the field value—including the delimiters

May I direct your attention to keep.field. Instead of saying what you want to delete you can specify what to keep.

I'm aware of it thanks, I used sed to finish the job :+1:

kirk86 avatar Jan 17 '20 23:01 kirk86

In this example the specification part %3s($type) is replaced with the type of the entry. Delete the two occurrences and the type will not show up.

I did try this but it didn't work

key.format = {
    %s(bibkey)
    #
    %0w(@book)
    { %-1n(author) # }
    { %4d(year) # }
    #
    %0w(@article)
    { %-1n(author) # }
    { %4d(year) # }
    #
    %0w(@proceedings)
    { %-1n(editor) # }
    { %-.1W(title). # %-.1W(booktitle) # }
    { %4d(year) # }
    #
    %0w(@inproceedings)
    { %-1n(author) # }
    { %4d(year) # }
    #
%    %3s($type)-
    { %-1n(author) # }
    { %4d(year) # }
    #
%    %3s($type)-%4d(year) # ???
}

So in the above example I've commented out the two occurences of %3s($type)- but now there's only empty keys?

kirk86 avatar Jan 18 '20 00:01 kirk86

% is no comment character in this context. As far as I can remember there is no means to comment out part of an argument.

Please try to delete the lines completely and provide an input entry if the problem persists.

ge-ne avatar Jan 18 '20 00:01 ge-ne

Please try to delete the lines completely and provide an input entry if the problem persists

Here're the rules after deleting the lines

key.format = {
    %s(bibkey)
    #
    %0w(@book)
    { %-1n(author) # }
    { %4d(year) # }
    #
    %0w(@article)
    { %-1n(author) # }
    { %4d(year) # }
    #
    %0w(@proceedings)
    { %-1n(editor) # }
    { %-.1W(title). # %-.1W(booktitle) # }
    { %4d(year) # }
    #
    %0w(@inproceedings)
    { %-1n(author) # }
    { %4d(year) # }
    #
    { %-1n(author) # }
    { %4d(year) # }
    #
}

And here's the input

@Misc{201512,
  author = {Damien, George and Widgerson, Avi},
  title         = {Fractional Max-Pooling},
  year          = 2015,
  month         = may,
  day           = 12,
  citeulike-article-id=13883509
}

@Misc{20154,
  author = {Marry-Anne, Luce and Withers, Sandy},
  title         = {An Algorithm of Artistic Style},
  year          = 2015,
  month         = sep,
  day           = 2,
  citeulike-article-id=13744245,
}

@Article{Articleabou-moustafa2016,
  title         = {What Is the Distance Between Humans},
  author        = {Abou-Moustafa, K.},
  year          = 2016,
  month         = mar,
  citeulike-article-id=13985734
}

@Article{Articleangelopoulos2016,
  title         = {Knowledge for the masses},
  author        = {Angelopoulos, Nicos and Cussens, James},
  year          = 2016,
  month         = aug,
  citeulike-article-id=14123990,
}

kirk86 avatar Jan 18 '20 00:01 kirk86

I have run the example and see no problem. The keys are generated as I would expect. In this context I have no global resource file and invoked BibTool with

 bibtool -F -r x.rsc x.bib

ge-ne avatar Jan 18 '20 08:01 ge-ne

The keys are generated as I would expect.

Can you give an example of the actual resulting file? Here's what I got

@Misc{		  damien.ea2015,
  author	= {Damien, George and Widgerson, Avi},
  title		= {Fractional Max-Pooling},
  year		= 2015,
  month		= may,
  day		= 12,
  citeulike-article-id=13883509
}

@Misc{		  marry-anne.ea2015,
  author	= {Marry-Anne, Luce and Withers, Sandy},
  title		= {An Algorithm of Artistic Style},
  year		= 2015,
  month		= sep,
  day		= 2,
  citeulike-article-id=13744245
}

@Article{	  Articleabou-moustafa2016,
  title		= {What Is the Distance Between Humans},
  author	= {Abou-Moustafa, K.},
  year		= 2016,
  month		= mar,
  citeulike-article-id=13985734
}

@Article{	  Articleangelopoulos.ea2016,
  title		= {Knowledge for the masses},
  author	= {Angelopoulos, Nicos and Cussens, James},
  year		= 2016,
  month		= aug,
  citeulike-article-id=14123990
}

As you can see the last two entries have the word Article in front of the author names, unless I understood something wrong from our previous conversations I wouldn't expect that to be the final result.

I would expect all entries to have only the authors names.

In this context I have no global resource file and invoked BibTool with

I tried the same command as well to generate the above example. I don't think that I have a global resource file myself as well since I've compiled bibtool in its own directory and running it from there instead of installing it system wide.

kirk86 avatar Jan 18 '20 18:01 kirk86

You are right. Sorry, I havn't read it carefully enough.

I have made some experiments into this direction and to me it appears that this is a bug. I will investigate how to fix.

ge-ne avatar Jan 19 '20 18:01 ge-ne

it appears that this is a bug

There's another error I think? Consider the following entry

@Article{plato1989,
  title         = {Some {{Random}}'s {{Title}} on the {{Works}} of {{Thoughts}}},
  author        = {{von Plato}, Jan},
  year          = 1989
}

In my examples the above fails to properly generate the key, it only populates the year as the key. Now if I change author = {{von Plato}, Jan} into author = {von Plato, Jan} then it properly generates the key.

kirk86 avatar Jan 19 '20 19:01 kirk86

I'll have a look at it

ge-ne avatar Jan 20 '20 21:01 ge-ne

@ge-ne Sorry I have a quick question,

in my resource file, I have the following regex translating " " into {}

rewrite.rule { "^\"\([^#]*\)\"$" = "{\1}" }
rewrite.rule { "# \"\([^#]*\)\"$" = "# {\1}" }
rewrite.rule { "^\"\([^#]*\)\" #" = "{\1} #" }
rewrite.rule { "# \"\([^#]*\)\" #" = "# {\1} #" }

As you can notice in the previous examples, you'll see that fields year, month, day are not properly enclosed in { } braces.

How can I write a regex in my resource file to say that anything not contained in { } properly ecnlose it in { }

kirk86 avatar Jan 22 '20 16:01 kirk86

Advise: Don't do it!

BibTeX defines a grammar. In this grammar you can write numbers as such and do not need to enclose them in braces or quotes. Enclosing them in braces just adds the possibility that a space character splips in which could make it into the output.

The month is another case of the BibTeX grammar. In fact the month name is an @string. This means it is usually defined in the bst. Thus the bst can decide how this month should be printed. For instance nov as Nov or November. If you enclose it in braces then it becomes a string which could not be treated this way.

BTW, in your example I have seen that you have developed an unhealthy practice to enclose parts of the value in braces. For BibTeX this means "treat it as one single special box and not as text". This practice is usually an indication that you are using the wrong bst. For instance the capitalization rules of the bst can not be applied on the braced part. This is sometimes (seldomly) needed but normally the bst should be chosen such that it fits to your needs and such dirty hacks are not required.

Back to the original question. BibTool applies the rewrite rule to the whole (normalized) value. Thus you can define a regex which does not include braces and add the braces in the replacement text.

ge-ne avatar Jan 22 '20 16:01 ge-ne

BTW, in your example I have seen that you have developed an unhealthy practice to enclose parts of the value in braces. For BibTeX this means "treat it as one single special box and not as text". This practice is usually an indication that you are using the wrong bst. For instance the capitalization rules of the bst can not be applied on the braced part. This is sometimes (seldomly) needed but normally the bst should be chosen such that it fits to your needs and such dirty hacks are not required.

It's not me it's the reference manager that's doing that. In addition, I don't know if bibtex defines a proper unified grammar and language to avoid the issues you've mentioned?

bst should be chosen such that it fits to your needs and such dirty hacks are not required

Is there a standard defacto normalisation way for bst? The reason why I wanted to enclose things in { } is because additional normalisation tools would not read my bib file if not every field value is enclosed in { }

Back to the original question. BibTool applies the rewrite rule to the whole (normalized) value. Thus you can define a regex which does not include braces and add the braces in the replacement text.

What's the right way to do that? This rewrite.rule { month = "^ * $" = "^{\1}$"} doesn't seem to work.

kirk86 avatar Jan 22 '20 17:01 kirk86

Is there a standard defacto normalisation way for bst?

Thus is up to any bst. But the definitions in alpha and friends are common.

The reason why I wanted to enclose things in { } is because additional normalisation tools would not read my bib file if not every field value is enclosed in { }

I don't mean enclosing the field value in braces but enclosing words inside a field value in braces.

What's the right way to do that?

rewrite.rule{ "^([0-9]+)$" = "{\1}" }

ge-ne avatar Jan 22 '20 21:01 ge-ne

I don't mean enclosing the field value in braces but enclosing words inside a field value in braces.

Gotcha ya, you mean something like title = {Another {Capitalised} Title which is {Wrong}}? BTW, I'm not doing that, it's the reference manager I presume which adds those things automatically when importing the refs.

What would be a recommended way to correct for that because there are different scenarios? For instance as before title = {Another {Capitalised} Title which is {Wrong}} vs title = {{{Another Capitalised Title which is Wrong}}}

BTW, I tried your suggestion rewrite.rule{ "^([0-9]+)$" = "{\1}" } but it's not working, I can still see all the month fields as month = feb instead of month = {feb}

kirk86 avatar Jan 23 '20 01:01 kirk86

What would be a recommended way to correct for that because there are different scenarios? For instance as before title = {Another {Capitalised} Title which is {Wrong}} vs title = {{{Another Capitalised Title which is Wrong}}}

The recommended way is to write title = {Another Capitalised Title which is Wrong} Double braces are never needed. And single braces in title and booktitle only. And just in case they contain an acronym or TeX constructs like title = {{iSAQB] tips {\&} tricks}

BTW, I tried your suggestion rewrite.rule{ "^([0-9]+)$" = "{\1}" } but it's not working, I can still see all the month fields as month = feb instead of month = {feb}

That's intentional. The month names are @string macros. The final BibTeX output would be feb in lower case and no terminating period. This sis imply nonsense. But you can add rules like the following one to get it right: rewrite.rule{ "^feb$" = "{February}" }

ge-ne avatar Jan 23 '20 07:01 ge-ne

Double braces are never needed.

I understand but as I explained it is done automatically from the ref. manager when inserting refs from the web. So I can't do anything there. And at this point with almost 1000 entries in the .bib file is impossible to correct the double braces by hand. That's why I started looking at bibtool in the first place.

kirk86 avatar Jan 23 '20 11:01 kirk86