Today is a freaking cool day. Why do you ask? Because today I am writing an article on how to use two of the coolest freaking big data/data science tools out there together to do epic shit! Lets start with HBase. HBase is a way to have a big data solution with query performance at an interactive level. So many folks are starting to just dump data into HBase. In the project teddy solution, we are dumping tweets, dialogue and dialogue annotations to power our open domain conversational api. There really is no other way that is easy to use for us to do this.
The second part of project teddy is to predict based on an incoming conversational component, what sort of response the speaker is attempting to illicit from the teddy bear. If we power our teddy bear with predictive analytics and big data, this would be perfect. What better platform to do this quickly and easily than AzureML?
Many folks may know that the South Florida Evangelism team is undertaking a task that many think is impossible. Well, in that statement all I hear is “there is still a chance!” The end goal is to create a teddy bear that can have a conversation about anything. So step one is to collect as much dialogue as possible from as many sources as possible and annotate them. What better place to power an association engine for word and phrase relevance than something that forces you down to 140 characters to get your message across.
So as any normal developer I decided to start by looking for samples already out there. MSDN has a great starter for writing tweets and doing sentiment analysis with HBase and C#. The only issue with the sample is, that it is very poorly written and difficult to understand with no separation of concerns. So I want to go through simplifying the solution and separating a few concerns out.