Machine Learning and Sitecore – The Sitecore Cortex

Posted 10-17-2018 by Albraa Nabelsi
During Sitecore Symposium, I attended two sessions about Sitecore Cortex: “The Cortex engine: Process at scale” by Alistair Deneys, and “Inside Cortex” by Colin te Kempel. The Sitecore Cortex is a new machine learning toolset that collects and processes huge amounts of data quickly, enhancing the customer experience by providing segmentation, personalization, automation, and attribution tools to marketing teams. This suite of features improves the relevance of content delivered to the users. Cortex will be released as part of Sitecore 9.1. 

To use Cortex effectively, we must understand how it works. It is made up of three parts: framing, data, and the model. 

Framing

Essentially, framing consists of simplifying a business goal into predictable indicators or “features”. The user defines this goal in order to identify opportunities. This framing must be done in the context of machine learning, which includes specifying a model and data used to train predictions. Framing is a fundamental topic in Machine Learning and outside the scope of this post, but you can learn more about it through Google here: Machine Learning - Framing  

Data 

Data is collected automatically by Sitecore via xConnect. Data can include contacts, interactions, and custom types. Data might need to be retrieved and formatted before it can be used – that’s where the data scientists would come in.  

The Model 

The model is the most complex piece of Cortex. It defines how features and labels are related to each other. Cortex uses the statistical programming language R to define the Machine Learning model. Developers will need to work with data scientists and Sitecore analysts to be able to develop and verify this model. 

The Sitecore Cortex Processing Engine 

The Sitecore Cortex processing engine drives the process. Essentially it is made up of a Task Agent and storage. The Task Agent is composed of
  • The Task Executor 
  • The Worker Process 
  • The Model 
  • The Data source 
The output of the task is sent to the storage and can then be used for various purposes. The tasks that are fed into the Task Agent are composed of two types: 

  • Distributed tasks: composed of a data source (batched to improve performance), a worker that will process the batched data, and the model. 
  • Deferred actions: tasks that are to be cancelled. 
The Task Agent’s processing cycle is composed of scanning for running tasks, scanning for pending tasks and executing them if possible, and checking for deferral actions. For performance, we can have vertical scaling, where the Processing Engine can process multiple Task Agents together or we can have horizontal scaling where multiple Processing Engines would be processing different sets of Task Agents. 

Sitecore Cortex uses these tasks to automate a lot of the processes that are manually done in the marketing workflow now. These tasks automate the creation of market segments, personalization rules, and content tags by finding patterns and traits that might not be easy to spot for a human being. For example, the Cortex engine will create tasks based on user searches and the content they click on. The tasks will then be executed in real time and their output stored. Over time, using this data, the search engine will learn what results are the most relevant for the search terms. 

I still have much to learn about Sitecore Cortex, but it is clear to me that this technology will have a large impact on how data is processed and used in Sitecore. I look forward to exploring it in more detail soon once Sitecore 9.1 is released!

Add your comment