bib2df
bib2df copied to clipboard
Problems parsing .bib from Web of Science
I downloaded a bib file from Web of Science savedrecs.zip and there are multiple issues when reading it. The solution shown in #21 doesn't work here :(
Most of them seen to be related with what you @ottlngr mentioned in in #21 (key-value pairs not separated by linebreaks):
- AUTHORS: The authors not in the first line are lost
- ABSTRACT: Only the first line of the abstract is imported
But other issues seem to arise from a different thing:
- A bunch of extra columns appear (for a simplified case, see [A] below)
[A] single_reference.zip When reading this bib reference, the following lines of the abstract are creating new columns (the first-word of the line is the column title, and the text in the cell is whatever comes after the "="):
- benefits and harms; n = 451) or non-evidence-based (e.g., relative risks
- on benefits only; n = 446) patient information about a cancer screening
- non-evidence-based patient information (n = 446), a mean of 33.1% of
- whereas with evidence-based patient information (n = 451), only half as
So, the first of those creates a BENEFITS column with a text "451) or non-evidence-based (e.g., relative risks"
Please, let me know if I can be of any help testing/debugging this.
Hi, thanks for your message.
This seems to happen because of the multi-line values in this particular .bib file. I'll have to play with it a bit to see what can be improved in bib2df to avoid this behaviour.
Any news on this issue? I have the same problem. I have downloaded a bib file from Web of Science and anything after a line break (e.g. all of the abstracts) is excluded from the dataframe. I really like your package otherwise, and hope that you are able to resolve this critical problem!
@ottlngr we ran into the same issue (our code builds on bib2df). Maybe the function here could constitute the basis for a solution (not sure how robust it is): https://github.com/paulcbauer/flex_bib/blob/master/merge_bib_lines.R
@jjsantana maybe this helps: https://github.com/paulcbauer/flex_bib#caveats
@paulcbauer I added a test caste that covers this issue. Of cource it fails at the moment, but feel free to try integrating your function and see if the test succeeds.
I added some code (optional argument merge_lines + function to merge lines). I am not sure whether (and how) it interacts with the separate_names argument. Also, there may be a nicer way to integrate it into your functions.
On Thu, Jul 2, 2020 at 9:59 PM Philipp Ottolinger [email protected] wrote:
@paulcbauer https://github.com/paulcbauer I added a test caste that covers this issue. Of cource it fails at the moment, but feel free to try integrating your function and see if the test succeeds.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/bib2df/issues/31#issuecomment-653195733, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB75DJ2FJD6OA7APBSUEOH3RZTRJBANCNFSM4IFFK3CA .
--
Dr. Paul C. Bauer
Mannheim Centre for European Social Research
University of Mannheim
Email: [email protected]
Current research: "Believing and Sharing Information by Fake Sources https://osf.io/mrxvc" Websites: Homepage http://www.paulcbauer.eu/, GoogleScholar https://scholar.google.ch/citations?user=zRqPQ_kAAAAJ&hl=en&oi=ao, ResearchGate https://www.researchgate.net/profile/Paul_Bauer4, www.tweetingpoliticians.com, SSRN http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=1911340, Twitter https://twitter.com/p_c_bauer, Github https://github.com/paulcbauer
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination, distribution, forwarding, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited without the express permission of the sender. If you received this communication in error, please contact the sender and delete the material from any computer.
Cool, thanks for the effort. I will have a closer look at it.
Cool thanks. There was some sort of error message but I didn't know how relevant it is.
On Fri, Jul 10, 2020 at 11:59 AM Philipp Ottolinger < [email protected]> wrote:
Cool, thanks for the effort. I will have a closer look at it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/bib2df/issues/31#issuecomment-656593845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB75DJ7SN4PXUPZZVPLQIBTR23RARANCNFSM4IFFK3CA .
--
Dr. Paul C. Bauer
Mannheim Centre for European Social Research
University of Mannheim
Email: [email protected]
Current research: "Believing and Sharing Information by Fake Sources https://osf.io/mrxvc" Websites: Homepage http://www.paulcbauer.eu/, GoogleScholar https://scholar.google.ch/citations?user=zRqPQ_kAAAAJ&hl=en&oi=ao, ResearchGate https://www.researchgate.net/profile/Paul_Bauer4, www.tweetingpoliticians.com, SSRN http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=1911340, Twitter https://twitter.com/p_c_bauer, Github https://github.com/paulcbauer
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination, distribution, forwarding, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited without the express permission of the sender. If you received this communication in error, please contact the sender and delete the material from any computer.
@paulcbauer 's suggestion to the merge_bib_lines
function in https://github.com/paulcbauer/flex_bib#caveats works out for me as a temporary solution (Thank you!). It can also process bib files that contain multiple bibs.
Hi there
Wondered if there was an update on this issue. I'm unable to import full abstracts from WoS .bib files and cannot get the above solutions to work. Thanks.
Apologies - I did get @paulcbauer's merge_bib_lines function to work and it solved the issue with import of incomplete abstracts - many thanks.
Problem I have now is that the merge_bib_lines function does not parse text properly when the character "=" is encountered - any ideas? Thanks
There should be some regex workaround. I just don't have any time right now to look into this (hopefully in the next weeks). Sorry!
On Thu, May 20, 2021 at 11:30 AM Robert Berryr @.***> wrote:
Problem I have now is that the merge_bib_lines function does not parse text properly when the character "=" is encountered - any ideas? Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/bib2df/issues/31#issuecomment-844911821, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB75DJ7ADZI2ZVWJB4TA7WTTOTJE3ANCNFSM4IFFK3CA .
--
Dr. Paul C. Bauer
Mannheim Centre for European Social Research
University of Mannheim
Email: @.***
Current research: "Believing and Sharing Information by Fake Sources https://doi.org/10.1080/10584609.2020.1840462" (Political Communication) Websites: Homepage http://www.paulcbauer.eu/, GoogleScholar https://scholar.google.ch/citations?user=zRqPQ_kAAAAJ&hl=en&oi=ao, ResearchGate https://www.researchgate.net/profile/Paul_Bauer4, www.tweetingpoliticians.com, SSRN http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=1911340, Twitter https://twitter.com/p_c_bauer, Github https://github.com/paulcbauer
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination, distribution, forwarding, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited without the express permission of the sender. If you received this communication in error, please contact the sender and delete the material from any computer.