Search functionality is critical in delivering a seamless user experience on digital platforms. Implementing a robust spell-checking system can significantly reduce user frustration by correcting typos and misspellings in real-time. In this blog post, we'll explore how to integrate Solr's DefaultSpellChecker with Sitecore using a dynamic field merging approach. This method not only enhances the spell correction capabilities but also simplifies the management of search schemas.
Setting Up Solr for Spell Checking
Merging Fields for Comprehensive Spell Checking
To enhance the spell-checking functionality, we merge data from three different fields into a single field designated for spell checking. These fields include:
- title_t (Single-value text field)
- description_t (Single-value text field)
- tags_sm (multi-value field containing keywords)
This merging is achieved using the CopyField directive, allowing us to consolidate information into a unified field that can be efficiently utilized by the Solr spell checker.
Automating Schema Modifications with a Sitecore Processor
Modifying the schema.xml file manually is a common practice, but it risks being overwritten during routine updates. To address this, we can employ a Sitecore C# helper method overwriting SchemaPopulateHelper that dynamically updates the schema without manual intervention. This helper method gets all the fields and appends newly added search_suggestion field and adds three CopyField.
Implementing the Sitecore C# Helper Class
Overwrite SchemaPopulateHelper as below: This processor will be responsible for getting two indexes, that is, master and web; the field doesn’t exist and updating it if necessary. It will also add copyField rules to merge the specified fields.
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using Sitecore.ContentSearch.SolrProvider.Pipelines.PopulateSolrSchema;
using SolrNet.Schema;
namespace namespace.Extension.Indexing
{
public class CustomPopulateHelper : SchemaPopulateHelper
{
public CustomPopulateHelper(SolrSchema schema) : base(schema)
{
}
public override IEnumerable<XElement> GetAllFields()
{
return base.GetAllFields().Union(GetAddCustomFields());
}
public override IEnumerable<XElement> GetAllFieldTypes()
{
return base.GetAllFieldTypes().Union(GetAddCustomFieldTypes());
}
private IEnumerable<XElement> GetAddCustomFields()
{
yield return CreateField("search_suggestion",
"text_general",
isDynamic: false,
required: false,
indexed: true,
stored: true,
multiValued: true,
omitNorms: false,
termOffsets: false,
termPositions: false,
termVectors: false);
yield return CreateCopyField("product_name_from_multilist_t", "search_suggestion");
yield return CreateCopyField("title_t", "search_suggestion");
yield return CreateCopyField("filename_t", "search_suggestion");
}
}
}
Using Sitecore.ContentSearch.SolrProvider.Abstractions
using Sitecore.ContentSearch.SolrProvider.Pipelines.PopulateSolrSchema;
using SolrNet.Schema;
namespace customNamespace
{
public class CustomPopulateHelperFactory : IPopulateHelperFactory
{
public ISchemaPopulateHelper GetPopulateHelper(SolrSchema solrSchema)
{
return new CustomPopulateHelper(solrSchema);
}
}
}
Patching the Processor in Sitecore Pipeline
Modify App_Config/Include to register the processor in the initialize pipeline:
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<pipelines>
<contentSearch.PopulateSolrSchema>
<processor type="Sitecore.ContentSearch.SolrProvider.Pipelines.PopulateSolrSchema.PopulateFields, Sitecore.ContentSearch.SolrProvider">
<param type="namespace.Extension.Indexing.CustomPopulateHelperFactory, namespace" patch:instead="*[@type='Sitecore.ContentSearch.SolrProvider.Factories.DefaultPopulateHelperFactory']"/>
</processor>
</contentSearch.PopulateSolrSchema>
</pipelines>
</sitecore>
</configuration>
Configuring Solr DefaultSpellChecker
To enable Solr SpellChecker, update solrconfig.xml to include:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_general</str>
<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
<str name="name">combined_spellchecker</str>
<str name="field">search_suggestion</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">10</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.1</float>
<float name="thresholdTokenFrequency">0.0001</float>
</lst>
</searchComponent>
Configuring the Request Handler
Add the following requestHandler to solrconfig.xml to enable spell checking through a Solr request:
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">search_suggestion</str>
<!-- Solr will use suggestions from both the 'default' spellchecker
and from the 'wordbreak' spellchecker and combine them.
collations (re-written queries) can include a combination of
corrections from both spellcheckers -->
<str name="spellcheck.dictionary">combined_spellchecker</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
This ensures that spell suggestions are generated from the defaultspellchecker field, which combines values from all three fields.
How to use the above configuration in your code
/// <summary>
/// Method to Get SpellSuggestions for search item using solr engine.
/// </summary>
/// <param name="searchText">Provides the search text</param>
/// <returns>returns Spell strings </returns>
public string GetSpellSuggestion(string searchText)
{
string correctedText = string.Empty;
try
{
string spellcheckJsonResult = string.Empty;
string solrURL = ConfigurationManager.ConnectionStrings["solr.search"].ConnectionString;
//string solrURL = Sitecore.Configuration.Settings.GetSetting("ContentSearch.Solr.ServiceBaseAddress");
string indexName = Helper.GetSiteIndex();
solrURL = solrURL.EndsWith("/") ? solrURL + indexName : solrURL + "/" + indexName;
string spellURL = string.Format(solrURL + "/spell?spellcheck.q={0}&spellcheck=true&spellcheck.collate=true&spellcheck.onlyMorePopular=true", searchText);
Log.Debug("Search SpellChecker Url: " + spellURL, this);
WebRequest request = WebRequest.Create(spellURL);
WebResponse response = request.GetResponse();
using (var sr = new StreamReader(response.GetResponseStream()))
{
spellcheckJsonResult = sr.ReadToEnd();
}
Log.Debug("Get Response", this);
JavaScriptSerializer serializer = new JavaScriptSerializer();
SolrSpellModel jsonObject = serializer.Deserialize<SolrSpellModel>(spellcheckJsonResult);
Log.Debug("Deserialize", this);
if (jsonObject != null && jsonObject.spellcheck != null & jsonObject.spellcheck.collations != null && jsonObject.spellcheck.collations.Count() > 1)
{
correctedText = jsonObject.spellcheck.collations[1].ToString().Trim('"');
}
}
catch (Exception ex)
{
Log.Error("SolrEngine:Error in getting spell suggestion:" + ex.Message, "SolrEngine");
}
return correctedText.Trim();
}
View the corrected text in your chstml via your controller.
Testing and Debugging
To verify that spell correction works, use the following Solr query:
https://localhost:8988/solr/core_name/spell?q=Moton moble&spellcheck=true
If configured correctly, this should return a corrected suggestion:
{
"responseHeader": {
"status": 0,
"QTime": 77
},
"response": {
"numFound": 0,
"start": 0,
"numFoundExact": true,
"docs": []
},
"spellcheck": {
"suggestions": [
"moton",
{
"numFound": 2,
"startOffset": 0,
"endOffset": 5,
"origFreq": 0,
"suggestion": [
{
"word": "motion",
"freq": 193
},
{
"word": "motor",
"freq": 28
}
]
},
"moble",
{
"numFound": 7,
"startOffset": 6,
"endOffset": 11,
"origFreq": 0,
"suggestion": [
{
"word": "mobile",
"freq": 813
},
{
"word": "module",
"freq": 48
},
{
"word": "model",
"freq": 30
},
{
"word": "mobiles",
"freq": 5
},
{
"word": "more",
"freq": 34
},
{
"word": "mode",
"freq": 10
},
{
"word": "mile",
"freq": 6
}
]
}
],
"correctlySpelled": false,
"collations": [
"collation",
"motion mobile"
]
}
}
Debugging Tips
- Ensure the defaultspellchecker field is populated in the Solr index.
- Rebuild the spell checker dictionary:
- https://localhost:8988/solr/sitecore_master_index/spell?spellcheck.build=true
- Check Sitecore logs for processor execution errors.
- Check Solr logs for any schema related errors
Conclusion
By using CopyField dynamically through a Sitecore helper class, we successfully merged multiple fields into one spellchecking field in Solr. The Sitecore helper method ensures the schema is updated dynamically, and Solr DefaultSpellChecker provides correct spell suggestions for user queries. This setup improves search accuracy and enhances the user experience.