Classora Knowledge Base

Classora Knowledge Base

Classora's objective consists in the integration of information from public sources (World Bank, International Monitory Fund, CIA, etc.), private sources and Internet users, and enriching the result with numerous added values. Among them it is noteworthy Classora's capacity to convert unstructured data into completely structured information, as well as its tools for representing the information gathered in different formats (rankings, tables, graphs, maps, etc.). As a result, Classora is a globally pioneering platform.

Internally, Classora is organized into «knowledge units» and «reports». A knowledge unit is an element of the world about which information can be stored and presented in the form of a data sheet (a person, a company, a country, etc.). A report is a set of knowledge units: a ranking of companies, a sport standing table, a poll, a user's question, etc. For example «Real Madrid» is represented in Classora as a knowledge unit, and «Spanish Soccer League» (LFP) is one of the many reports in which Real Madrid is listed.

Basic definitions of Classora model

That's our primary model: each knowledge unit is displayed like a Wikipedia article. Instead of plain text, however, Classora's knowledge units have structured information and they are enriched with additional data from the reports in which they are listed.

All information in Classora's Knowledge Base come from either an automated loading process (through ETL robots) or a manual loading process (through user collaboration). Through this infrastructure, Classora aims to provide users with the specific piece of data or with the exact data set that they need. Furthermore, Classora's tools enable users to combine data of knowledge units in order to extract new information using data mining and OLAP technology. Classora's aim is to be a Business Intelligence platform applied to all the human knowledge available in public sources on the Internet.

Classora is, however, a project in process of construction. At this early stage we have focused on public reports, that is, all the lists, sport standing tables and polls in which an element can be present. In fact, polls, also known as participatory rankings (because they are arranged according to users' votes) have become one of the prominent parts of this early Classora. There is, however, much more information than what can be seen at firs glance: cross-checking each knowledge unit with these reports and polls provides a new perspective, and you can easily see the position these knowledge units hold in a report. For instance, you can see all the information Classora already stores about a country like Spain.

Classora allows in-depth analysis of all this information using comparative graphs and several types of customized researches.

In making Classora possible, our team has had to overcome two main technical difficulties:

Extraction, transformation and loading of data (ETL): Classora gathers information from totally heterogeneous sources, from either structured or unstructured schemes, integrating them and adding explanatory metadata. Developing these robots is one of our greatest technological efforts.

Understanding user queries: another big challenge. We are working on the definition of a semiformal query language called CQL (Classora Query Language), that allows to make sophisticated questions to the data center acting (in a narrow context and with a controlled grammar) as a basis to solve the problem of natural language comprehension. For the time being, however, using our report creation wizard is more intuitive.

In short, Classora aims to organize information in a new way on the Internet. Based in Bussiness Intelligence techniques and in the concept of Semantic Web, Classora serves to create, share, and analyze all kinds of reports and lists, but also to display the data sheets of each person and element present in them.