{"id":557,"date":"2015-10-01T18:15:39","date_gmt":"2015-10-01T23:15:39","guid":{"rendered":"http:\/\/sunapi386.ca\/wordpress\/?p=557"},"modified":"2015-10-28T01:52:01","modified_gmt":"2015-10-28T06:52:01","slug":"customer-support-platform","status":"publish","type":"post","link":"https:\/\/sunapi386.ca\/wordpress\/customer-support-platform\/","title":{"rendered":"Customer Support Platform"},"content":{"rendered":"<div class=\"maketitle\">\n<div class=\"maketitle\">\n<div class=\"maketitle\">\n<div class=\"maketitle\">\n<h2 class=\"titleHead\">Customer Support Platform<\/h2>\n<div class=\"author\"><\/div>\n<div class=\"date\"><span class=\"ecrm-1200\">October 1, 2015<\/span><\/div>\n<\/div>\n<h2 class=\"titleHead\">Increase customer support efficiency by using preformed answers and optionally modifying it before replying to customers.<\/h2>\n<h3 class=\"likesectionHead\"><a id=\"x1-1000\"><\/a>Goal<\/h3>\n<p class=\"noindent\">Build an API as demo to investors, about 3 weeks away. Basically a customer hands over their customer support chat logs, we provide back query-responses through API. The (reiterated) version of the problem is: retrieve relevant responses (from previously seen responses) based on customer question.<\/p>\n<ul class=\"itemize1\">\n<li class=\"itemize\">Treat customer question as a query.<\/li>\n<li class=\"itemize\">Retrieve a reasonable response.<\/li>\n<li class=\"itemize\">The meat of the problem lies in creating a good mapping from query to a response.<\/li>\n<\/ul>\n<p class=\"noindent\">Due to the timely nature of building a demo in short time, I look to using pre-existing tools rather than develop an entire process from scratch. Obviously it\u2019s hard to publish any papers on using existent techiques, but our goal constraint involves more engineering than research.<\/p>\n<h3 class=\"likesectionHead\"><a id=\"x1-2000\"><\/a>Pre-existing tools approach<\/h3>\n<h4 class=\"likesubsectionHead\"><a id=\"x1-3000\"><\/a>Apache Lucene<\/h4>\n<p class=\"noindent\"><a href=\"http:\/\/lucene.apache.org\/core\/\"><span class=\"ecbx-1000\">Apache Lucene<\/span><\/a>, arguably the most advanced, high-performance, and fully featured search engine library in existence today\u2014both open source and proprietary. But since it is a library only, it would be difficult to get started &#8211; you\u2019d need to build around the library. This is the search engine library used behind Wikipedia, Guardian, Stack Overflow, Github, Akamai, Netflix, LinkedIn.<\/p>\n<ul class=\"itemize1\">\n<li class=\"itemize\">Lucene has pluggable relevance <span class=\"ecbx-1000\">ranking models <\/span>(NLP information extraction and sentiment analysis) are built in, including the the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Vector_space_model\">Vector Space Model<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Okapi_BM25\">Okapi BM25<\/a>.<\/li>\n<li class=\"itemize\">The power of Lucene is text searching\/analyzing. It\u2019s very fast because all data in every field is indexed by default. Text searching focused applications should definitely use Lucene.<\/li>\n<\/ul>\n<p class=\"noindent\">There are two predominant platforms built on top of Lucene. Apache Solr, and Elasticsearch. These two are built and designed for full text search on top of Lucene. Both are open source.<\/p>\n<p class=\"indent\">ElasticSearch is friendlier to teams which are used to REST APIs, JSON etc and don\u2019t have a Java background, so we\u2019ll run with that.<\/p>\n<h4 class=\"likesubsectionHead\"><a id=\"x1-4000\"><\/a>Elasticsearch<\/h4>\n<p class=\"noindent\">Elasticsearch is also written in Java and uses Lucene internally but makes full-text search easy by hiding the complexities of Lucene behind a simple, coherent, RESTful API.<\/p>\n<ul class=\"itemize1\">\n<li class=\"itemize\">Also pluggable <span class=\"ecbx-1000\">ranking models<\/span>! This is important to try different approaches to getting good customer results. The modularity of this means we can build one pipeline, and improve our response by using different ranking models.<\/li>\n<li class=\"itemize\">Can be plugged with our own custom ranking functions. For instance, we might care about\n<ul class=\"itemize2\">\n<li class=\"itemize\">Information decay, where more recent responses snippet at the top.<\/li>\n<li class=\"itemize\">Ranking based on uses and non-uses of a response snippet.<\/li>\n<\/ul>\n<\/li>\n<li class=\"itemize\">Customer\u2019s questions treated as query input, and support agent\u2019s responses treated as snippets to look up.<\/li>\n<li class=\"itemize\">References\n<ul class=\"itemize2\">\n<li class=\"itemize\">General intro <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/guide\/current\/intro.html\">Learn | Docs<\/a>.<\/li>\n<li class=\"itemize\"><a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/guide\/current\/scoring-theory.html\">Theory Behind Relevance Scoring<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h5 class=\"likesubsubsectionHead\"><a id=\"x1-5000\"><\/a>Searching<\/h5>\n<ul class=\"itemize1\">\n<li class=\"itemize\"><span class=\"ecbx-1000\">Relevance<\/span>: Elasticsearch\u2019s main advantage over a traditional database is full-text search. Search results are sorted by their relevance score. The concept of relevance is completely foreign to traditional databases, in which a record either matches or it doesn\u2019t. See <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/guide\/current\/_full_text_search.html\">Full Text Searching<\/a>.<\/li>\n<li class=\"itemize\"><span class=\"ecbx-1000\">Phrase Search<\/span>: Sometimes we want to match exact sequences of words, <span class=\"ecti-1000\">phrases<\/span>. Use the <span class=\"ectt-1000\">match_phrase <\/span>query in <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/guide\/current\/_phrase_search.html\">Phrase Search<\/a>.<\/li>\n<li class=\"itemize\"><span class=\"ecbx-1000\">Highlighting<\/span>: Although not super important, we can highlight the snippet that matched our search. <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/guide\/current\/highlighting-intro.html\">Highlighting<\/a>.<\/li>\n<\/ul>\n<h4 class=\"likesubsectionHead\"><a id=\"x1-6000\"><\/a>Ranking Models<\/h4>\n<p class=\"noindent\">Using a good ranking model is the meat of the problem. Famous ranking models:<\/p>\n<ul class=\"itemize1\">\n<li class=\"itemize\"><span class=\"ecbx-1000\">TF-IDF <\/span><a href=\"http:\/\/michaelerasm.us\/tf-idf-in-10-minutes\/\">What is TF-IDF? The 10 minute guide<\/a> <a href=\"https:\/\/en.wikipedia.org\/wiki\/Tf%E2%80%93idf\">Wikipedia TF-IDF<\/a><\/li>\n<li class=\"itemize\"><span class=\"ecbx-1000\">BM25 <\/span>is regarded slightly better in our case than TF-IDF.\n<ul class=\"itemize2\">\n<li class=\"itemize\">Quote from <a href=\"https:\/\/www.elastic.co\/blog\/found-similarity-in-elasticsearch\">Similarity in Elasticsearch<\/a>: There is a reason why TF-IDF is as widespread as it is. It is conceptually easy to understand and implement while also performing pretty well. That said, there are other, strong candidates. Typically, they offer more tuning flexibility. In this article we have delved into one of them, BM25. In general, it is known to perform just as good or even better than TF-IDF, especially on collections with short documents.<\/li>\n<\/ul>\n<\/li>\n<li class=\"itemize\">Consider taking <a href=\"https:\/\/class.coursera.org\/nlp\/lecture\/124\">Coursera on NLP<\/a>, learn more about ranking models.<\/li>\n<\/ul>\n<p class=\"noindent\">These two above are considered statistical analysis. In recent years, fundamental break-throughs were archieved using machine learning, specifically with <span class=\"ecbx-1000\">neural architectures<\/span>, in several subfields of AI \u2013 computer vision, speech recognition, machine translation. Consequently, more advanced ranking models could be derived from approaches in neural networks.<\/p>\n<h3 class=\"likesectionHead\"><a id=\"x1-7000\"><\/a>Training Data<\/h3>\n<p class=\"noindent\">Evaluating any prediction or recommendation engine relies on having a good set of data. The Ubuntu Dialogue Corpus is one such dialogue dataset.<\/p>\n<h4 class=\"likesubsectionHead\"><a id=\"x1-8000\"><\/a>Ubuntu Dialogue Corpus<\/h4>\n<p class=\"noindent\">The Ubuntu Dialogue Corpus, introduced by this <a href=\"http:\/\/arxiv.org\/pdf\/1506.08909v2.pdf\">paper<\/a>, contains almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. Along with introduction of the dialogue corpus, the paper also discusses learning architectures suitable for analyzing this dataset.<\/p>\n<p class=\"indent\">Specifically, the following architectures are benchmarked for performance:<\/p>\n<ul class=\"itemize1\">\n<li class=\"itemize\">Term Frequency-Inverse Document Frequency (<span class=\"ecbx-1000\">TF-IDF<\/span>, which is what is used by the Elasticsearch\/Lucene engine)<\/li>\n<li class=\"itemize\">Recurrent Neural Network (<span class=\"ecbx-1000\">RNN<\/span>)<\/li>\n<li class=\"itemize\">Long Short-Term Memory (<span class=\"ecbx-1000\">LSTM<\/span>) architecture<\/li>\n<\/ul>\n<p class=\"noindent\">Performance evaluation is based on the task of best response selection, without human labels. The agent is asked to select the <span class=\"cmmi-10\">k <\/span>most likely responses, and it is correct if the true response is among the <span class=\"cmmi-10\">k <\/span>candidates. The family of metrics used in language tasks is called Recall@k. For example, <span class=\"cmmi-10\">k <\/span><span class=\"cmr-10\">= 1 <\/span>is denoted as R@1.<\/p>\n<p class=\"indent\">The observed result is that the LSTM outperforms both RNN and TF-IDF on all evaluation metrics.<\/p>\n<h4 class=\"likesubsectionHead\"><a id=\"x1-9000\"><\/a>Daerli Chinese Conversation Log<\/h4>\n<p class=\"noindent\">A confidential corpus of support dialogues are to be used in our testing, as customers involved are Chinese companies.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Customer Support Platform October 1, 2015 Increase customer support efficiency by using preformed answers and optionally modifying it before replying to customers. Goal Build an API as demo to investors, about 3 weeks away. Basically a customer hands over their customer support chat logs, we provide back query-responses through API. The (reiterated) version of the &hellip; <a href=\"https:\/\/sunapi386.ca\/wordpress\/customer-support-platform\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Customer Support Platform<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[34],"tags":[],"class_list":["post-557","post","type-post","status-publish","format-standard","hentry","category-thoughts"],"_links":{"self":[{"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/posts\/557","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/comments?post=557"}],"version-history":[{"count":4,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/posts\/557\/revisions"}],"predecessor-version":[{"id":561,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/posts\/557\/revisions\/561"}],"wp:attachment":[{"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/media?parent=557"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/categories?post=557"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sunapi386.ca\/wordpress\/wp-json\/wp\/v2\/tags?post=557"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}