Krotos Modules 3
Loading...
Searching...
No Matches
krotos::NER Class Reference

NER is a Named Entity Recognition (NER) class designed to identify and extract named entities from unstructured text data. More...

#include <NER.h>

Classes

struct  Entity
 

Public Member Functions

std::unordered_map< String, StringArray > findEntity (String text, const std::unordered_map< String, StringArray > &dictionary, float similarityThreshold=0.9f)
 Search text for named entities held in dictionary.
 

Private Member Functions

StringArray ngrams (const StringArray &tokens, int n=1)
 Compute ngrams for the given StringArray.
 
int levenshteinDistance (const String &str1, const String &str2)
 Compute the Levenshtein distance between strings.
 
float stringSimilarity (const String &str1, const String &str2)
 Compute the string similarity.
 
std::tuple< float, String, String > getFuzzySimilarity (String text, const std::unordered_map< String, StringArray > &dictionary, float similarityThreshold)
 Search for matching named entities using fuzzy string matching.
 
std::vector< EntityremoveOverlapping (std::vector< Entity > entities)
 Remove overlapping entities (keep longest)
 

Detailed Description

NER is a Named Entity Recognition (NER) class designed to identify and extract named entities from unstructured text data.

Member Function Documentation

◆ findEntity()

std::unordered_map< String, StringArray > krotos::NER::findEntity ( String text,
const std::unordered_map< String, StringArray > & dictionary,
float similarityThreshold = 0.9f )

Search text for named entities held in dictionary.

Parameters
textThe text to search for named entities
dictionaryThe dictionary of named entities and keywords
similarityThresholdThe threshold for fuzzy string matching
Returns
A map containing detected named entities and keywords

◆ getFuzzySimilarity()

std::tuple< float, String, String > krotos::NER::getFuzzySimilarity ( String text,
const std::unordered_map< String, StringArray > & dictionary,
float similarityThreshold )
private

Search for matching named entities using fuzzy string matching.

Parameters
textThe text to search for matching named entities
dictionaryThe dictionary of named entities and keywords
similarityThresholdThe threshold for fuzzy string matching
Returns
A tuple containing the named entity, similarity score and category

◆ levenshteinDistance()

int krotos::NER::levenshteinDistance ( const String & str1,
const String & str2 )
private

Compute the Levenshtein distance between strings.

Returns
The distance

◆ ngrams()

StringArray krotos::NER::ngrams ( const StringArray & tokens,
int n = 1 )
private

Compute ngrams for the given StringArray.

Parameters
tokensThe StringArray to organise into ngrams
nThe ngram order
Returns
The resulting ngrams

◆ removeOverlapping()

std::vector< NER::Entity > krotos::NER::removeOverlapping ( std::vector< Entity > entities)
private

Remove overlapping entities (keep longest)

Parameters
entitiesThe vector of entities to filter
Returns
The filtered entities

◆ stringSimilarity()

float krotos::NER::stringSimilarity ( const String & str1,
const String & str2 )
private

Compute the string similarity.

Returns
The similarity score [0.0, 1.0]

The documentation for this class was generated from the following files: