This is a story of Chris McKinlay, a Mathematics major PhD Student from UCLA and how he used unstructured data mining to find his true love.Chris, frustrated by the match-making algorithm provided by OkCupid, decides to take matters in his own hands. The story is as much about how he wrote Python robots to scrape data from online profiles as it is about the difficulties he faced in his approaches, and how he tweaked his approach. When OkCupid disallowed his robots to harvest data, he trained his robots to become more human- like. This allowed him to get 6 million questions/ answers from 20000 profiles. Then armed with all this data, Chris set out to find patterns. For McKinlay’s plan of dating like a mathematician to work, he needed to find patterns from this vast data—a way to roughly group the women according to their similarities. Being a hard core mathematician, Chris coded up a modified Bell Labs algorithm called K-Modes first used in 1998 to analyze diseased soybean crops and divided 20,000 women into 7 distinct clusters. These clusters were based on their answers, which were harvested using the robots! The story doesn’t end here cause Chris had to take all this theoretical analysis and had to put it real world tests by dating women from each of the clusters. It is truly a fascinating story that ends with a sweet ending.
You can read the entire article on wired.com.
We all have read many articles about the buzz word technologies such as big data, unstructured data analytics and Hadoop. However as this story goes to tell, these buzz words will soon start shaping our personal and social lives in ways that we never imagined! It is no surprise then that we now say, “Personal data is the new oil of our digital economy!”. This does raise concerns such as the thin line between big data and big brother, however that is a topic for an article some other time. Till then “hack, analyze, love!”