Thoughts – Page 13 – sunapi386's Blog

Customer Support Platform

October 1, 2015

Increase customer support efficiency by using preformed answers and optionally modifying it before replying to customers.

Goal

Build an API as demo to investors, about 3 weeks away. Basically a customer hands over their customer support chat logs, we provide back query-responses through API. The (reiterated) version of the problem is: retrieve relevant responses (from previously seen responses) based on customer question.

Treat customer question as a query.
Retrieve a reasonable response.
The meat of the problem lies in creating a good mapping from query to a response.

Due to the timely nature of building a demo in short time, I look to using pre-existing tools rather than develop an entire process from scratch. Obviously it’s hard to publish any papers on using existent techiques, but our goal constraint involves more engineering than research.

Pre-existing tools approach

Apache Lucene

Apache Lucene, arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary. But since it is a library only, it would be difficult to get started – you’d need to build around the library. This is the search engine library used behind Wikipedia, Guardian, Stack Overflow, Github, Akamai, Netflix, LinkedIn.

Lucene has pluggable relevance ranking models (NLP information extraction and sentiment analysis) are built in, including the the Vector Space Model and Okapi BM25.
The power of Lucene is text searching/analyzing. It’s very fast because all data in every field is indexed by default. Text searching focused applications should definitely use Lucene.

There are two predominant platforms built on top of Lucene. Apache Solr, and Elasticsearch. These two are built and designed for full text search on top of Lucene. Both are open source.

ElasticSearch is friendlier to teams which are used to REST APIs, JSON etc and don’t have a Java background, so we’ll run with that.

Elasticsearch

Elasticsearch is also written in Java and uses Lucene internally but makes full-text search easy by hiding the complexities of Lucene behind a simple, coherent, RESTful API.

Also pluggable ranking models! This is important to try different approaches to getting good customer results. The modularity of this means we can build one pipeline, and improve our response by using different ranking models.
Can be plugged with our own custom ranking functions. For instance, we might care about
- Information decay, where more recent responses snippet at the top.
- Ranking based on uses and non-uses of a response snippet.
Customer’s questions treated as query input, and support agent’s responses treated as snippets to look up.
References
- General intro Learn | Docs.
- Theory Behind Relevance Scoring

Searching

Relevance: Elasticsearch’s main advantage over a traditional database is full-text search. Search results are sorted by their relevance score. The concept of relevance is completely foreign to traditional databases, in which a record either matches or it doesn’t. See Full Text Searching.
Phrase Search: Sometimes we want to match exact sequences of words, phrases. Use the match_phrase query in Phrase Search.
Highlighting: Although not super important, we can highlight the snippet that matched our search. Highlighting.

Ranking Models

Using a good ranking model is the meat of the problem. Famous ranking models:

TF-IDF What is TF-IDF? The 10 minute guide Wikipedia TF-IDF
BM25 is regarded slightly better in our case than TF-IDF.
- Quote from Similarity in Elasticsearch: There is a reason why TF-IDF is as widespread as it is. It is conceptually easy to understand and implement while also performing pretty well. That said, there are other, strong candidates. Typically, they offer more tuning flexibility. In this article we have delved into one of them, BM25. In general, it is known to perform just as good or even better than TF-IDF, especially on collections with short documents.
Consider taking Coursera on NLP, learn more about ranking models.

These two above are considered statistical analysis. In recent years, fundamental break-throughs were archieved using machine learning, specifically with neural architectures, in several subfields of AI – computer vision, speech recognition, machine translation. Consequently, more advanced ranking models could be derived from approaches in neural networks.

Training Data

Evaluating any prediction or recommendation engine relies on having a good set of data. The Ubuntu Dialogue Corpus is one such dialogue dataset.

Ubuntu Dialogue Corpus

The Ubuntu Dialogue Corpus, introduced by this paper, contains almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. Along with introduction of the dialogue corpus, the paper also discusses learning architectures suitable for analyzing this dataset.

Specifically, the following architectures are benchmarked for performance:

Term Frequency-Inverse Document Frequency (TF-IDF, which is what is used by the Elasticsearch/Lucene engine)
Recurrent Neural Network (RNN)
Long Short-Term Memory (LSTM) architecture

Performance evaluation is based on the task of best response selection, without human labels. The agent is asked to select the k most likely responses, and it is correct if the true response is among the k candidates. The family of metrics used in language tasks is called Recall@k. For example, k = 1 is denoted as R@1.

The observed result is that the LSTM outperforms both RNN and TF-IDF on all evaluation metrics.

Daerli Chinese Conversation Log

A confidential corpus of support dialogues are to be used in our testing, as customers involved are Chinese companies.

Types of code commenting

In the chapter on commenting, McConnell divides comments into six categories. Figure 3.1 gives summarized versions of his definitions, with some of his commentary.

Repeat of the code: States what the code does in different words. Just more to read.
Explanation of the code: Explains complicated, tricky, or sensitive code. Make the code clearer instead.
Marker in the code: Identifies unfinished work. Not intended to be left in the completed code.
Summary of the code: Distills a block of code into one or two sentences. Such comments are useful for quick scanning.
Description of the code’s intent: Explains the purpose of a section of code, more at the level of the problem than at the level of the solution.
Information that cannot possibly be expressed by the code itself : Copyright notices, confidentiality notices, pointers to external documentation, etc.

Why I Think Now Is A Good Time For Machine Learning

As we all know, since the mid 18th century, when the scientific methods were established, we have gone through a few technological revolutions. Most recently it was the industrialization revolution, and now information is the current revolution. The availability of information is more noticeable with the decreasing cost of storage. Now it is easy to acquire large amount of information; what to do with it is the question. The answer is machine learning. To participate in the information era, you can start off learning about machine learning.

Learning machine learning has a few benefits, which I’ll talk about.

There are patterns in the large quantity of data, but it is infeasible for humans to analyze.
Success = Opportunity + Preparation. And there are lots of opportunities. We need the preparation.

Clearly, it is in your favour to study machine learning, so you too can develop the necessary tools to deal with large amount of data. In other words, the availability very large data sets is one of the resources fuelling the information revolution. It seems obvious that being able to utilize this information is key to being part of the current information era.

/more to come

January 2015 Thoughts

Went to talk with Professor Richard Mann. He suggested some readings based on what I was reading (On Intelligence) and what I was interested in.
1/ Thinking Fast & Slow on AI by Daniel KAHNEMAN. Covers theory of choice, preferences, attention.
2/ The Black Swan: Impact of the Highly Improbable by TALEB.
3/ Scientific Discovery by Paul THAGARD, professor at uWaterloo.
4/ How to Build a Brain by Chris ELIASMITH
5/ Computer Vision by Richard SZELISKI

While reading “On Intelligence” I had a thought. Is being tired just the result of neurons firing so much that they need to recharge? Since being out of the electrochemical signalers (used up faster than it naturally regenerates). This would explain why you get tired of a certain activity after intensive use of those neurons. For example, studying… after a while, you lose ability to focus.

Upon going to DC library I remembered the UW Police could track stolen MACs. So there was an idea, good for a startup (if nothing like it already exists): Stolen Goods Network Tracking Down Organization.
– All devices have MAC address and use it to connect to a network.
– Reported stolen goods with police has its MAC marked stolen.
– Partnered networks (e.g. McDonald’s free wifi) matches stolen MACs, and reports and monitors activity to local police.
– Police tracks and recovers stolen goods. (This part flaky.)

On the Information Age

To say we live in interesting times would be a vast understatement. Trying to keep up with the technological advances made is a job in itself. We are exposed to a huge amount of content, causing us to feel overwhelmed and frustrated; too much data can drag us back into the very same inertia we would be experiencing if there was absolutely nothing of interest going on.

We are at a pivotal point in history where we have an abundance of access to technology. Many of us are having trouble coming to terms with that. There is simply too much to do, an unlimited amount of potential. In a way, it can be easier to be forced down a narrow path than to figure out how to traverse down a huge boulevard. That is why we cannot be afraid to make mistakes and false starts as we refine our talents.

In computer science talk, technology has given us a large amount of edges per node (and this is greater than any point in our history), and our best choice to traverse down this exponentially large tree is do breadth first search with heuristics.

The learning process has changed on virtually every level and the old rules just don’t apply anymore. Rather than wait for someone to issue new rules, we need to plunge into our own era of experimentation and innovation and shape it for our own purposes of expression.

As easy and accessible as technology has made things, actual skill remains an achievement that can’t be bought or even given away. Social media has been known to drag people down to pits of trivia and irrelevance, wasting vast amounts of valuable time, potentially causing us to lose privacy and lose basic social skills. But social media can also be an invaluable resource if we /choose/ to use it in that way. Social media is a tool that has great potential, if it got people to think and to see results. People have tremendous potential power and most people don’t even realize it. This is precisely why governments in every country keep tight control over these new outlets, for example.

The abundance access to technology (and thus increased productivity) means as an individual, we are more empowered than ever – whether we work individually or in groups. As hackers, we generally like to explore our own interests and take our own paths. Writing from our own perspective is essential, but there is also strength in numbers. In groups, there are varieties of opinions and even disagreements, which, contrary to the belief of many, only serve to strengthen and help define the basic premise of the cause we are united on. In both cases, our skill and experience matters more than ever, as these technological tools can be leveraged to more devastating effects in their hands. The list of social media tools available today is huge, but the issue of skill and experience are just as relevant and vital as they’ve ever been.

Look forward to the explosion of creativity and productivity ahead in our new age.

Note: Contents in this post is inspired by 2600 The Hacker Quarterly Volume Thirty-One, Number Four, article title “Tools for a New Future”.