LyricsGenius
LyricsGenius copied to clipboard
Remove the Hyperlink text from lyrics scrapper
When you use your package to scrape lyrics it includes text for the hyperlinks at the end of the lyrics, see attached screenshot. For a reproductible example, I have attached this in a jupyter notebook.
. This can be removed with some regex code I have created below. I am agnostic if this should be done to all lyrics or only when remove_section_headers=True is selected.
Potential Solution: hyperlinks_removed = re.sub(r"[0-9]+EmbedShare URLCopyEmbedCopy",'',lyrics)
Example for reproduction
import lyricsgenius as lg import genius_token as gt genius = lg.Genius(gt.token, # Client access token from Genius Client API page skip_non_songs=True, excluded_terms=["(Remix)", "(Live)"], remove_section_headers=True)
songs = (genius.search_artist('Kanye-west', max_songs=1, sort='popularity')).songs s = [song.lyrics for song in songs]
print(s[0][-30:])
Thanks for the regex, I had exactly the same problem.
I slightly modified the regex to
hyperlinks_removed = re.sub(r"[0-9]*URLCopyEmbedCopy",'',lyrics)
because the other one failed for songs that had zero shares.
Yeah. I'm hitting this as well. I'll add the regex to me code for a short term fix
I had to slightly modify Vuizur's solution because it was only getting the URLCopy part
(Javascript):
let re = /[0-9].*URLCopyEmbedCopy/
lyrics.match(re)
Thanks! I was wondering why the output on my songs lyrics were outputting this!