bibstuff icon indicating copy to clipboard operation
bibstuff copied to clipboard

sphinxext: global issue with UTF-8 support for files actually having non-ascii characters?

Open yarikoptic opened this issue 14 years ago • 10 comments

I thought originally it was of a failure to support 'latex' (opposite to UTF8) encoded .bib files resulting in crash:

File "/usr/lib/pymodules/python2.6/simpleparse/dispatchprocessor.py", line 120, in lines
return countlines (buffer[start or 0:end or len(buffer)])
File "/usr/lib/pymodules/python2.6/simpleparse/stt/TextTools/TextTools.py", line 467, in countlines
  return len(tag(text, linecount_table)[1])
TypeError: Low-level command (41) argument in entry 2 couldn't be converted to a string object, is a unicode

neither setting

:encoding: iso-8859-1

for biblisted nor

% Encoding: latex

in the header of .bib helped to resolve. actually converting .bib file to utf-8 (using kbibtex), removing above coding settings lead to the same failure :-/

only using matthew_brett.bib, without any UTF8 per se succeded. Adding an insulting unicode russian е instead of proper ascii e in the name of the respectful author of the first entry, did not result in the above crash unfortunately but at least obscured the authors name to become "Matthew Br." when using jasss_style

yarikoptic avatar Feb 15 '11 18:02 yarikoptic

Yes, sadly, simpleparse does not probably will never support unicode. I've since written two unicode supporting bibtex parsers, and I've been talking to Andrey Golovizin, the author of pybtex, who's got a long way to bibtex compatibility using pure python. So, probably the fix here would be dumping bibtools and doing a rewrite. Is it something you have urgent need of?

matthew-brett avatar Feb 15 '11 18:02 matthew-brett

Hey Matthew,

excellent work! I just started playing with it. I noticed that the same thing:

Exception occurred:
  File "/usr/lib/pymodules/python2.6/simpleparse/stt/TextTools/TextTools.py", line 467, in countlines
    return len(tag(text, linecount_table)[1])
TypeError: Low-level command (41) argument in entry 2 couldn't be converted to a string object, is a unicode

happens when there is a comment in the BIB file. In my case this one:

@Comment{x-kbibtex-encoding=utf-8}

After its removal I get perfect results.

Best,

Michael

mih avatar Feb 20 '11 00:02 mih

nice finding ;-) it seems that any kind of @comment ruins it

yarikoptic avatar Feb 20 '11 01:02 yarikoptic

Guys,

I'm afraid bibtools has a very fast parser that is fragile and essentially impractical to fix. I've written slower parsers that are much more like bibtex in their behavior, but dropping a new parser in would take a few days of work. You're voting for the few days I guess?

matthew-brett avatar Feb 21 '11 18:02 matthew-brett

What about using http://pybtex.sourceforge.net/ for all parsing -- supports UTF8 and few other exotic reference formats (YAML, BibTeXML). It lacks any formatting output for ReST ATM though, but seems to be quite nice and somewhat active project

yarikoptic avatar Feb 21 '11 18:02 yarikoptic

bloody buttons -- how to reopen it? I clicked 'Comment & Close' by mistake ;)

yarikoptic avatar Feb 21 '11 18:02 yarikoptic

I think 'Actions - Open' opens it again. I've been talking to the pybtex guy - Andrey Golovizin - result above. He tried one of my new parsers and then wrote his own in rapid order that is indeed reasonably fast and good a running through errors. The problem is that pybtex has two modes. One is 'bibtex mode' - and for that Andrey uses the bibtex .bst files and a parser for the bst language. That mode only outputs latex - because that's what the bst files output. Then there's python mode. Python mode outputs html and latex, but only has a single 'unsrt' style, which is still incomplete - for example it doesn't deal with conference papers yet as I remember, and is more fragile (requires entries in the citation that bibtex will allow to be empty). So, it would be some (useful) work to make a fairly useful rst output from pybtex.

matthew-brett avatar Feb 21 '11 19:02 matthew-brett

Hi @matthew-brett ,

I wondered if you had a chance to dig into this one again? thought to make use of bibstuff sphinx extension again but forgot about this little show stopper. Cheers!

yarikoptic avatar May 17 '15 17:05 yarikoptic

Sorry - no - I hadn't - it seemed hopeless.

Have you tried sphinxcontrib-bibtex? I was thinking of switching to that (but it may still lack the functionality to output a given list of references, specified in the bibliography).

https://github.com/mcmtroffaes/sphinxcontrib-bibtex

matthew-brett avatar May 27 '15 18:05 matthew-brett

Issue here: https://github.com/mcmtroffaes/sphinxcontrib-bibtex/issues/54

On 5/27/15, Matthew Brett [email protected] wrote:

Sorry - no - I hadn't - it seemed hopeless.

Have you tried sphinxcontrib-bibtex? I was thinking of switching to that (but it may still lack the functionality to output a given list of references, specified in the bibliography).

https://github.com/mcmtroffaes/sphinxcontrib-bibtex

matthew-brett avatar May 27 '15 18:05 matthew-brett