Products and services

graph

Classora
Knowledge Base

Classora Technologies has developed an innovative technology: a knowledge base oriented to data analysis. Its objective is to offer a different perspective of the information available on the Internet. It integrates data from many open sources, enriching the result. Read more

graph

Classora
Media Support

The semantic services of the Classora Media Support suite provides the media with an original database to increase their traffic indicators (pages/visit ratio and number of visits), encourage content reuse, maximize their presence in search engines and improve user experience. Read more

graph

Classora
Augmented TV

It is a service aimed to enrich EPG's (Electronic Program Guides) and, in general, any deferred program (movies, tv series) using the text from the subtitles. The information can be displayed on the device itself (tv) or on an auxiliary second screen (tablet, smartphone). Read more

More about Classora's technology...

The current technology used by Classora Technologies comes from the product Classora Knowledge Base, the first knowledge base in Spanish on the Internet.

In order to provide updated and really useful information, the knowledge base needs to be constantly incorporating public data from the available sources. But given the huge amount of data available on the Internet, these sources range from completely structured official platforms (such as Eurostat, the Natinal Statistics Institute, FIFA...) to public non-official sources written in plain text or scarcely structured (such as blogs, e-commerce stores, or even the Wikipedia). To that effect, Classora has developed three types of robots for data managing:

1) ETL robots: responsible for the massive uploading of reports from official public sources. They are used for either absolute or incremental data uploading.

2) Data scanner robots: responsible for seeking and updating the data of a unit of knowledge. They use specific sources to perform this task.

3) Content aggregators: they do not connect to external sources. Instead, they generate new information using Classora´s internal database.

Classora's ETL robots perform the following procedures:

  • Extraction: parsing the information in the different data sources.
  • Transformation: filtering, cleaning and structuring the data.
  • Loading and enrichment: new data are linked to old information.

In absolute values, however, Classora Knowledge Base handles a small amount of the information available on the Internet. Besides, each new source of information adds to the complexity of the integration with the data previously loaded due to the increase of the number of variables. Without manual monitoring (more and more expensive and impractical) Data quality is bound to diminish due to the increase in the data volume.

This can be avoided, however, by investing in I+D+i; therefore our company is constantly improving the loading robots in order to incorporate new data sources with lower structuration levels, in more languages, and with better integration with the previously loaded data. The main problem we are facing is one imposed by technological evolution: the transformation of unstructured data into structured information.


ETL: Extraction, Transformation and Load

ETL processes are the most important components and the ones that offer the greater added value in a Business Intelligence infrastructure. Although this processes may seem transparent to the platform users, ETL processes gather data from every necessary source and prepare the information to be presented through the reporting and analysis tools. Thus, the accuracy of any platform that manages data integration depends entirely on ETL processes. In Classora's case, ETL robots complete and enriches every piece of data with the corresponding metadata (loading date, source, reliability of the data, refresh rate, meaning, connections, etc.) for its subsequent automatic procressing.

Implementing effective and reliable ETL processes brings up many challenges:

1) Data volume grows exponentially, and ETL processes have to process a huge amount of information. Some of the systems update incrementally, while others require a complete reloading for each iteration.

2) As information systems are increasingly complex, the disparity of the sources also increases, and therefore their integration. ETL processes need ample connectivity and greater flexibility.

3) Transformations involved in ETL processes can be very complex. Data need to be aggregated, analyzed, computed, statistically processed, etc. Specific transformations, costly from a computational perspective, are sometimes needed.

Currently, there are commercial tools and even free software with great capacity for data extraction. In fact, speed and performance issues don't pose a big technical problem in extraction and loading. Data transformation is where the bottleneck is actually found: at this point unstructured information needs to be converted into structured information in order to be integrated with the already existing data in the target system.


Semantic Processes

NPL (Natural Language Processing) is one of the early cornerstones of Artificial Intelligence (AI). Automated translation, for instance, was born at the end of the 40's before the expression "Artificial Intelligence" was coined. In general terms, NPL deals with the formulation and investigation of computationally effective mechanisms for the communications between people and machines through natural languages.

At this stage, however, natural language interpretation algorithms have not been completely developed yet. The main problem is the ambiguity of human language. Such ambiguity is made clear at different levels:

1) At lexical level, one word can have different meanings, and the suitable one has to be deduced from the context. Many investigations in the field of natural languages processing have studied methods of solving lexical ambiguities through dictionaries, grammars, knowledge bases and statistical correlations. But the solutions still need further development.

2) At referential level, the resolution of anaphoras and cataphoras implies the determination of the previous and following linguistic items they are referring to.

3) At estructural level, semantics is required to determine the hierarchy of the propositional syntagms that forms the different syntactic trees, i. e. : "He is a student of Philosophy of education".

4) At pragmatic level, a sentence does not often mean what it is actually being said. Elements such as irony and sarcasm play an important role in the interpretation of the message.

To solve these types of ambiguities and others, the central problem in the PLN is the translation of natural language inputs to an internal representation unambiguous as parse trees. This is precisely the solution which are choosing most of public knowledge bases available on the Internet, including the initial approach of Classora with CQL (Classora Query Language).