Introduction

In the previous 6 articles we have illustrated the usage of Google and AWS NLP APIs. We have also experimented the spacy library to extract entities and nouns from different documents. We have shown how to improve the model using pattern matching function from spaCy ( https://spacy.io/ ) . Finally we have also trained the model with new entities. We have demonstrated how to match CV to job profiles.

Let us now dig a bit deeper into some linguistic features of Spacy and how this can used in improving virtual conversations. The same can be used for mail processing, more advanced chatbots or virtual assistants. It can also be used as underlying technique for voice assistants.

Let us say we are an online shop for personal computers and we would like to allow our customers to send us requests to order computers. This can come through a site chatbot or by mail.

Incoming text

Let us assume we receive following input from a potential client:

Hello
I would like to order a notebook with 16GB and 256 GB disk, I would like to spend less than 1000 Francs, what would be the options
Thanks a lot
Patrick

As we have shown in earlier articles, let us import required Python libraries and process the text through the Spacy pipeline. Nothing new, but good to repeat:

# import required libraries 
import spacy
from spacy.pipeline import EntityRuler
from spacy.matcher import Matcher,PhraseMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN, root, xcomp
from spacy import displacy
from spacy.matcher import Matcher
# install the large pre-trained English model driectly from spacy
!python -m spacy download en_core_web_lg
# process the input text through the standard spacy model
docMail=nlp(text)

Named Entities Recognition

Once this is all done, let us start with named entity recognition as usual.

# print text entities detected
for ent in docMail.ents :
    print(ent.text, ent.label_,)

16GB QUANTITY
256 GB QUANTITY
less than 1000 Francs MONEY
Patrick PERSON

We can also visualize the result directly in the text with highlighted entities.

Hello, I would like to order a notebook with 16GBQUANTITY and 256 GBQUANTITY disk, I would like to spend less than 1000 FrancsMONEY, what would be the options Thanks a lot PatrickPERSON

The default model does not seem to detect notebook and disk as entities, but identifies the sender as a person and identifies the RAM and disk size as quantities. This is a good start, but still far away from a practical solution. So, let us add some domain specific entities that will help us later on.

# add domain specific entities and add to the pipeline
patterns = [{"label": "COMPUTER", "pattern":  [{"lower": "notebook"}]},
             {"label": "CURRENCY", "pattern":  [{"lower": "francs"}]},
            {"label": "PART", "pattern":  [{"lower": "disk"}]}]

ruler = EntityRuler(nlp, patterns=patterns,overwrite_ents=True)
nlp.add_pipe(ruler)

Now the results look a bit better

#process the mail again with added entities
docMail=nlp(text)
for ents in docMail.ents:
    # Print the entity text and its label
    print(ents.text, ents.label_,)

notebook COMPUTER
16GB QUANTITY
256 GB QUANTITY
disk PART
Francs CURRENCY

Matching some specific patterns

Sometimes it is not enough to match only entities, for example we have defined the RAM size as 16 GB. So let us see how to detect the memory size automatically

matcher = PhraseMatcher(nlp.vocab)
terms = ["16 GB","256 GB"]
# Only run nlp.make_doc to speed things up
patterns = [nlp.make_doc(t) for t in terms]
matcher.add("MEMORY", None, *patterns)

doc = nlp(text)
matches = matcher(doc)
for match_id, start, end in matches:
    span = doc[start:end]
    print(span.text)

16GB
256 GB

Quite cool, it detected the patterns and matched the text related to memory size. Unfortunately, the issue is that we do not know to what it refers to, so we need to start a different kind of analysis.

Dependency Parsing: Identify verbs, modifiers and objects

One of the key features of Spacy is its linguistic and predictive features. Indeed, Spacy is able to make a prediction of which tag or label most likely applies in a specific context.

Let us start with displaying the result of part of speech tagging and dependency analysis. As we can see below, the code is pretty simple

displacy.render(docMail, style="dep", minify=True)

The result is quite impressive, it shows all predicted tags for each word and the dependency tree with the associated dependency labels. For example ‘I’ is a pronoun and is subject to the verb ‘like’.

Spacy dependency tree

Let us detect the numerical modifiers, as we will need them to identify the memory size required

for token in docMail:  
    if token.dep_ == 'nummod':    
      print(f"Numerical modifier: {token.text} --> object: {token.head}")

Numerical modifier: 16 --> object: GB
Numerical modifier: 256 --> object: disk
Numerical modifier: 1000 --> object: Francs

This is again quite cool, we can associate quantities to different words in the text.

Identifying the action verbs

Spacy provides all the required tagging to find the action verbs, we want to know if the customer wants to order something or is just interested by some information for example. Let us iterate through all tokens in the text and search for an open clausal complement ( refer to for all possible dependency tags https://spacy.io/api/annotation#pos-tagging )

verbs = set()
for possible_verbs in docMail:
    if possible_verbs.dep == xcomp and possible_verbs.head.pos == VERB :
        verbs.add(possible_verbs)
print(verbs)

{spend, order}

We have now identified ‘spend’ and ‘order’ as possible actions in the text. We can also do the same to find objects or items in the text that are the referred to by the client.

Identifying items

Let us find possible items in the text using the dependency tag ‘dobj’ for direct objects of a verb.

items = set()
for possible_item in docMail:
    if possible_item.dep == dobj and possible_item.head.pos == VERB:
        items.add(possible_item)
print(items)

{Francs, notebook}

‘Francs’ and ‘notebook’ has been found. Now we can think of using word similarities to find what kind of item the client is referring to. We could also use other techniques, but let us try a simple way for now. We will compare similarities between identified obejcts and the word ‘laptop’. The word ‘notebook’ is much closer to ‘laptop’ than Francs.

orderobject=nlp("laptop")
for  sub in items:
  print(sub.similarity(orderobject))

0.0015887124852857469
0.8021939809276627

Finally putting it together, we can think of automatically detecting the required action verb using a heuristic. Let us assume that if the similarity is more than 80%, then we have found the right verb. We then search for the direct object of the similar verb. That could look like this

orderword=nlp("order")
for  verb in verbs:
  if (verb.similarity(orderword)) >=0.8:
    for v in verb.children:
      if v.dep==dobj:
        print(v.text)

notebook

The tech stack

For this experiment we have used the following

  • Google collab executing a Python 3 notebook
  • Python 3.6.9
  • Spacy 2.2.4

Patrick Rotzetter

(Visited 242 times, 1 visits today)
%d bloggers like this: