NER is a Named Entity Recognition (NER) class designed to identify and extract named entities from unstructured text data.
More...
#include <NER.h>
|
| std::unordered_map< String, StringArray > | findEntity (String text, const std::unordered_map< String, StringArray > &dictionary, float similarityThreshold=0.9f) |
| | Search text for named entities held in dictionary.
|
| |
|
| StringArray | ngrams (const StringArray &tokens, int n=1) |
| | Compute ngrams for the given StringArray.
|
| |
| int | levenshteinDistance (const String &str1, const String &str2) |
| | Compute the Levenshtein distance between strings.
|
| |
| float | stringSimilarity (const String &str1, const String &str2) |
| | Compute the string similarity.
|
| |
| std::tuple< float, String, String > | getFuzzySimilarity (String text, const std::unordered_map< String, StringArray > &dictionary, float similarityThreshold) |
| | Search for matching named entities using fuzzy string matching.
|
| |
| std::vector< Entity > | removeOverlapping (std::vector< Entity > entities) |
| | Remove overlapping entities (keep longest)
|
| |
NER is a Named Entity Recognition (NER) class designed to identify and extract named entities from unstructured text data.
◆ findEntity()
| std::unordered_map< String, StringArray > krotos::NER::findEntity |
( |
String | text, |
|
|
const std::unordered_map< String, StringArray > & | dictionary, |
|
|
float | similarityThreshold = 0.9f ) |
Search text for named entities held in dictionary.
- Parameters
-
| text | The text to search for named entities |
| dictionary | The dictionary of named entities and keywords |
| similarityThreshold | The threshold for fuzzy string matching |
- Returns
- A map containing detected named entities and keywords
◆ getFuzzySimilarity()
| std::tuple< float, String, String > krotos::NER::getFuzzySimilarity |
( |
String | text, |
|
|
const std::unordered_map< String, StringArray > & | dictionary, |
|
|
float | similarityThreshold ) |
|
private |
Search for matching named entities using fuzzy string matching.
- Parameters
-
| text | The text to search for matching named entities |
| dictionary | The dictionary of named entities and keywords |
| similarityThreshold | The threshold for fuzzy string matching |
- Returns
- A tuple containing the named entity, similarity score and category
◆ levenshteinDistance()
| int krotos::NER::levenshteinDistance |
( |
const String & | str1, |
|
|
const String & | str2 ) |
|
private |
Compute the Levenshtein distance between strings.
- Returns
- The distance
◆ ngrams()
| StringArray krotos::NER::ngrams |
( |
const StringArray & | tokens, |
|
|
int | n = 1 ) |
|
private |
Compute ngrams for the given StringArray.
- Parameters
-
| tokens | The StringArray to organise into ngrams |
| n | The ngram order |
- Returns
- The resulting ngrams
◆ removeOverlapping()
| std::vector< NER::Entity > krotos::NER::removeOverlapping |
( |
std::vector< Entity > | entities | ) |
|
|
private |
Remove overlapping entities (keep longest)
- Parameters
-
| entities | The vector of entities to filter |
- Returns
- The filtered entities
◆ stringSimilarity()
| float krotos::NER::stringSimilarity |
( |
const String & | str1, |
|
|
const String & | str2 ) |
|
private |
Compute the string similarity.
- Returns
- The similarity score [0.0, 1.0]
The documentation for this class was generated from the following files: