Curva Fin Bloque
Do you want:

Text Classification / Categorizer

Automatically categorize documents according to knowledge classifiers.

Pangea automatic text classification and categorization consists of a collection of modules that implement common classification and categorization tasks. This may be related to Text Classification or operate as a separately, at high-level, also finding a set of defined relationships among those modules.

The various details are flexible – for example, you can choose what categorization algorithm to use, what features (words or otherwise) of the documents should be used (or how to automatically choose these features), what format the documents are in, and so on.

Curva Fin Bloque Negativa

The customization process of using this module typically involves obtaining a collection of pre-categorized documents from the organization. Pangea trains its deep neural networks to recognize the features of each document and the difference with other documents. This creates a “knowledge graph” representation, training a categorizer to recognize a particular knowledge set. This trained set is saved and queries can be set against it.

There are several ways to carry out the queries. The top-level Text Classification and Categorizer module provides an umbrella class for top level category classifier operations, but you may use the interfaces of the individual classes in each class.

Our semantic tool automatically classifies documents by content and organizes them within general categories such as Eurovoc or it can be customized to your organization’s structure, terminology and processes. The Categories can be Legal, Compliance, Human Resources, Research and Development, Accounts and Finance, Reports (Sales, Management, etc.), Customer Feedback, Newsletters, and many more. The definition of categories is a free user’s choice not restricted by categorization algorithms.

The customization process

Curva Fin Bloque Positiva

Text Classification / Categorizer accuracy

Text Classification and Categorization of documents is often a difficult task even for humans well-trained in the particular domain of knowledge, and there are many things a human would consider that none of these algorithms consider. One document, for example, may belong to more than one Category. Our Use Cases provide previous applications in Fintech with over 90% accuracy in defined domains. Some human supervision may remain due to unexpected or new types of documents.

The Pangea Text Classification / Categorizer is an ideal solution for:

  • Enterprise content / Knowledge management;
  • Financial documentation categorization;
  • Insurance document pre-classification;
  • Evaluation of new trends in business, science and technology.
  • Business information management;
  • Patent prior art search and analysis;
  • Automated helpdesk systems;
Curva Fin Bloque Positiva

Categorization technology

The Pangea Categorizer algorithms are based on deep machine learning techniques. Our approach to document categorization is run in two phases: the training phase and the prediction stage.

At the training stage, the Pangea Categorizer builds a classifier by learning from a set of model documents for each category. Its learning algorithm uses a wide range of semantic features extracted from document texts:

  • Words with part of speech tags;
  • Noun phrases and syntactic dependency between them;
  • Complex semantic relations detected our Linguistic Processor.

This training process creates models which at prediction stage use the vector space model to categorize documents. Each input text is compared with semantic features from the model category and the degree of proximity between them is calculated. The document is assigned to the category with maximum relevance value.

Curva Fin Bloque Positiva

Contact us or give us a call on

Please ask our sales team for any particular configuration

Pangeanic translation services