Skip to main content

Natural Language Processing (NLP) is fast becoming an essential skill for modern-day organizations to gain a competitive edge. It has become the essential tool for many new business functions, from chatbots and question answering systems to sentiment analysis, compliance monitoring, and BI and analytics of unstructured and semi-structured content.

Consider all the unstructured content that can bring significant insights – queries, email communications, social media, videos, customer reviews, customer support requests, among other. NLP tools and techniques help process, analyze, and understand unstructured “big data” in order to operate effectively and proactively.

We explore five use cases that adopt NLP solutions to transform businesses and operations – as demonstrated in client projects for over a decade at AES.

 

Unstructured Data Acquisition

With NLP, organizations can acquire unstructured content from external and internal sources for search and analytics. Data can be identified and extracted using:
  • Downloadable data available through paid and free content sources on the Internet
  • Search Technologies’ secure connectors to popular business content repositories

 

Raw Language Processing

As raw data varies from different sources, once the content is acquired, organizations can bring data cleansing and formatting services to ensure data is properly prepared for the highest-quality results.
  • Determine the format (e.g. PDF, XML, HTML, etc.)
  • Extract text content
  • Identify and remove irrelevant sections (common headers, footers, sidebars, boilerplates)
  • Identify differences and changes
  • Extract coded metadata
  • Token extraction, normalization, and cleansing
  • Phrase extraction

 

Query Understanding

In many use cases, the content is written down in a natural language (such as English, Chinese, Spanish, etc.) but not conveniently tagged. Organizations can use tools and techniques to help them extract information from this content. Some levels of text mining, text extraction, or possibly full-up NLP may be leveraged. Typical full-text extraction includes:

  • Entity extraction – such as companies, people, dollar amounts, key initiatives, etc.
  • Content categorization – positive or negative (e.g. sentiment analysis); by function, intention or purpose; or by industry or other categories for analytics and trending
  • Content clustering – to identify main topics of discourse and/or to discover new topics
  • Fact extraction – to fill databases with structured information for analysis, visualization, trending, and alerting
  • Relationship extraction – to fill out graph databases to explore real-world relationships

 

Statistical Language Processing

In many NLP projects, statistical techniques can provide a general understanding of the document as a whole. Some statistical processing use cases include:

  • Clustering
  • Categorization
  • Similarity
  • Topic analysis
  • Word clouds
  • Summarization

 

Question-Answering System

A question-answering system (also known as Insight Engines – a term coined by Gartner) parses queries for natural language questions and then integrates with back-end systems to deliver direct answers rather than just a list of results containing the keyword. A question-answering system can be built using Search Technologies’ Natural Language Processing Toolkit combined with an advanced and scalable set of natural language processing tools which can perform all of the necessary functions for query understanding. NLP tools include:

  • Tokenization
  • Acronym normalization
  • Lemmatization
  • Sentence and phrase boundaries
  • Entity extraction (all types but not statistical)
  • Statistical phrase extraction
  • Question pattern recognition
  • Statistical disambiguation
  • Question-answer to action response
  • Business user interfaces (see below)

Benefits of question-answering systems include:

  • A number of business user interfaces are available for entering and maintaining entities and patterns. These interfaces allow business users with no programming experience to enter and maintain common entities and question/response patterns.
  • Programmer intervention is only required to integrate with back-end systems.
  • The answers can be pulled from relational databases, RESTful APIs to any business system, or from the search engine results.
  • Depending on your requirements, answers can be formatted as a natural language response or as a chart, report, or interactive graphic.