The architecture of KYOTO is shown in the flash animation below. In the schema, the different modules of the Kyoto system are given as rectangular boxes and the data structures by blue repositories. The animation shows two cycles of processes in Kyoto which are explained in more detail below.
The cycles start with documents and websites that are provided by the users in the project: ECNC and WWF.We have collected a first set of documents websites in 4 languages that focus on a series of environmental themes. If you click on the logos of the users, you will get a baseline retrieval system for these documents.
The Kyoto Modules
The Kyoto system has the following modules:
- Syntactic processors: they produce a syntactic and morphological analysis of the text
- Semantic processors: they determine what the meaning is of the words in the text
- Tybots: they learn the terms that are used in the documents and organize these as a hierarchy. If you click on the term extractor module, you can access a demo that gives access to term databases that have been extracted from the environment documents.
- Term editor (part of the Wikyoto platform): users can edit the terms, give definitions and agree on what they mean. These users are called concept users since they are domain experts that maintain the terminology. You can click on the module to go to a demo of the term editor and try it out yourself.
- Kybots: little programs that use the knowledge built up for terms to extract facts from any set of documents. If you click on this module, you will access a demo where you can design or submit a Kybot to extract facts from a sample database
- NL Query: search module with which any end user (people from the domain, government, companies, students, children, etc.) can access the database of facts that is produced. If you click on this module, you can access a demo on semantic search on the English data.
The Kyoto Repositories
In the architecture there are also 4 databases:
- Document base: a database that holds all the documents after being processed by the syntactic and semantic processors. The text is represented in a special XML format called the Kyoto Annotation Format. If you click on the database, you can see examples of this format in different languages and get access to the DTD.
- Term database: this database holds the output of the term extraction. The terms can be exported into XML in a special format that is called Kyoto-TMF. If you click on the database, you get more details on the term structure in TMF.
- Multilingual Knowledge Base: this database holds the wordnets in all the languages and ontologies that are already given. It holds also any domain wordnet and ontology that is built by editing the term database. If you click on the database, you can view the databases, which are represented in a special XML format for wordnets (Wordnet-LMF) and for ontologies (OWL).
- Fact database: this is the database in which all the extracted facts are stored. This database still needs to be designed in the project. Further details will follow. Note that the database can also hold changing realities. It can extract a fact at some point in time and another fact related to the same things and same place at another point in time.
The Kyoto CyclesDocuments and websites are then processed in two cycles, which is shown in the animation:
- First cycle in which concept users upload specialized documents and sources to acquire a good term database and to enable them to build a good domain wordnet and ontology:
- sources are processed syntactically and semantically and the output is stored in the document base as KAF-XML
- the Tybots extract the terms and put the terms in the term database
- the domain specialists review and modify the terms, define their meaning and agree with the meanings of terms in other languages through the ontology
- Second cycle in which the same documents or any other set of documents are sent to Kyoto to extract any facts:
- sources are processed syntactically and semantically and the output is stored in the document base as KAF-XML (same as in the first cycle)
- the Kybots extract the facts that the end-users are interested in and stores the facts in the fact database
- End users get alerts on new facts or can search in the database to get comprehensive and precise informations that can be organized in many different ways, e.g. per region or along time lines, to reveal trends and changes.