bibliometrix icon indicating copy to clipboard operation
bibliometrix copied to clipboard

mergeDBSources problem

Open kmkrumbholz opened this issue 3 years ago • 2 comments

Hello,

I am using the above command to merge files from Scopus and Web of Science and I want to remove duplicates, but I want to retain entries that have an abstract. Is there a way to ensure this? After a manual audit, I did find that in some cases it would keep the entry with an abstract and in some cases it would not. I also tried a workaround where I set the remove.duplicated argument to FALSE and then used tidyverse to remove duplicates, but somewhere in that process I lost functionality in bibliometrix. I know in one of the FAQs you recommending using one or the other, but by using both sources I increase the number of yielded articles by a substantial amount.

kmkrumbholz avatar Jul 26 '20 21:07 kmkrumbholz

Hi,

I'm pretty sure that it can't be done with the Bibliometrix.

Look at the mergeDBSources() source code here: https://rdrr.io/cran/bibliometrix/src/R/mergeDbSources.R

The function identifies just the duplicated titles (M$TI) and doesn't take anything else under consideration.

I had a similar problem last month with the cleaning of my data, but in my case, I wanted to check the titles and author names for misspellings. So I wrote some code to identify potential problems and fix it in haw data before import it to a bibliometric data frame.

Regards

jacksonraniel avatar Aug 07 '20 14:08 jacksonraniel

I had a big problem with mergeDbSources(Data, remove.duplicated = TRUE) This function removes all hyphens (-), eg. "spring-mass" is converted "springmass". I did not see any explantion in bibliometrix vignette. After 10 months of work with a big data, I noticed that I lost my original titles :(

oguzozbay avatar Sep 05 '21 16:09 oguzozbay