In aiclass.com, we just covered Naive Bayesian Classifiers, and it couldn’t have been more perfectly timed. Prior to that lecture series, one of the projects that I am working on required that I build a classifier for a large body of data that was getting funneled into the system. I spent quite a bit of time searching for the best way to do this, hoping that there would be a rubygem that could save me some effort, but much to my chagrin, nothing quite fit the bill–so I started in on building my own.
The basic idea behind a Naive Bayes classifier is that we have some set of documents that have been categorized (into n categories) and want to use this information about our existing labeled documents to predict the category of new, not yet labeled, documents. It is a pretty direct use of Bayes rule and is probably best understood through an example.
Say you have 5 documents:
- {subject: ‘Must read!’, text: ‘Get Viagra cheap!’, label: ‘spam’}
- {subject: ‘Gotta see this’, text: ‘Viagra. You can get it at cut rates’, label: ‘spam’}
- {subject: ‘Call me tomorrow’, text: ‘We need to talk about scheduling. Call me.’, label: ‘not spam’}
- {subject: ‘That was hilarious’, text: ‘Just saw that link you sent me’, label: ‘not spam’}
- {subject: ‘dinner at 7′, text: ‘I got us a reservation tomorrow at 7′, label: ‘not spam’}
- {subject: ‘See it to believe it’, text: ‘Best rates you’ll see’, label: ?}
require 'classyfier'
@classyfier = Classyfier::NaiveBayes::NaiveBayesClassifier.new@classyfier.train({:subject => 'Must read!', :text => 'Get Viagra cheap!'}, :spam)@classyfier.train({:subject => 'Gotta see this', :text => 'Viagra. You can get it at cut rates'}, :spam)@classyfier.train({:subject => 'Call me tomorrow', :text => 'We need to talk about scheduling. Call me.'}, :not_spam)@classyfier.train({:subject => 'That was hilarious', :text => 'Just saw that link you sent me'}, :not_spam)@classyfier.train({:subject => 'dinner at 7', :text => 'I got us a reservation tomorrow at 7'}, :not_spam) @scores = @classyfier.classify({:subject => 'See it to believe it', :text => 'Best rates you\'ll see'})