This project was created alongside Ibrahim Saifullah. The code for the main process can be found here (written in Java via the Maven Framework), and the code for the Genius lyrics API calls can be found here (written in Python).
Hip Hop is defined by its lyricality - what is (sometimes) lacking in sound is made up for in clever rhymes and hard-hitting punchlines. LyricLearner is a project I co-created to dive deeper into understanding these lyrics, inspired by my appreciation of the genre.
LyricLearner has two primary functions, both occasionally tweeted out on @LyricLearner through the Twitter API. @LyricLearnerBot was intended to promote the account by replying to mentions of popular music artists but was quickly put to an end by Twitter's anti-spam rules.
LyricLearner's first level of functionality is generating artist lyrics based on their existing lyrics. This is a four-phase process. First, lyrics are parsed from Genius, a lyric database, using a call to their RESTful API in Python. These lyrics are stored locally in a text file. Secondly, a hashmap is created of words and potential follow-up words. This is called a Markov Chain - more specifically, a Markov Chain is a statistical model where probabilities for upcoming events (in this case words) are based on the previous event. For example, if the word 'idea' follows the word 'the' in half of the lyrics, this probability is stored. Then, when generating phrases, 'idea' will be generated half of the times 'the' is generated. This simulates the artist's lyrical style without directly copying lyrics.
This Markov Chain hashmap was used to generate lyrics that followed the probability distribution of the artist. Several thousand potential lines were generated for each artist. Finally, these lyrics were tweeted at random intervals via a call to the Twitter API from Java's Maven Framework. Below are some sample tweets. While they weren't actually written by the artists, the lyrics follow the Markov Chain distribution and, as a result, emulate the artists' lyrical style.
The second layer of functionality to LyricLearner was an artist sentiment analysis. Sentiment analysis (performed by the Stanford Natural Language Processing library) enabled me to interpret the level of positivity or negativity of each word. Gauging the sentiment of artist albums over time helped us create insights into artists' lives. This was done by parsing the lyrics from genius, averaging the sentiment of each word, normalizing and scaling the values, and charting each album's sentiment. For example:
LyricLearner was a really fun project, combining my interest in algorithms and machine learning with my love for Hip Hop. As One direction could have said,
Thanks for reading! If you have any questions, or would like to get in contact, you can reach out to me at aruneswara@icloud.com.