WordNet-LMF overall design
The WordNet-LMF is a lexical data format defined in the KYOTO with the aim of endowing the WordNets, the lexical resources of the project, with a common representation format to allow easier access, integration and interoperability among them.
The WordNet-LMF format builds on the representational devices made available by the ISO Lexical Mark-up Framework (LMF) and tailors them to the specific content requirements of the WordNet model of lexical knowledge representation. WordNet-LMF is an ISO-LMF dialect and represents a first real attempt to fully apply LMF to wordnet-like lexicons.
Starting form the core meta-model provided by LMF (rev.16), the additional packages used in WordNet-LMF are the semantics and the multilingual extension packages.
WordNet-LMF fully complies with the standard LMF as for its general framework.
LMF library provides the hierarchy of lexical objects with structural relations among them. The Data Category library provides the elementary descriptors to be used in combination with the structural elements, necessary to represent lexical information.
Expression of WordNet-related types of information (such as synset relations, external sources linked to wordnets) falls into the realm of LMF Data Categories, which are by definition either selectable from the pre-defined standard registry or custom-defined. The WordNet-LMF format, accordingly, has defined a number of Data Categories, necessary to fully represent the various wordnets to be integrated in KYOTO. Examples of custom Data Categories are values for describing synset relations,
inter-lingual relations, for identifying external resources and their associated nodes, etc. For the sake of better parsing efficiency, in WordNet-LMF, Data Categories are represented by means of XML attributes and values instead of nested lexical objects. By explicitly naming the attributes, we also make a stronger claim about the features and properties of the structure of a wordnet. This will enforce better compatibility and interoperability across the many wordnets for different languages that are available. In this respect, the WordNet-LMF DTD implementation has to be seen as a dialectal variant of the LMF DTD.
The WordNet-LMF Core Component
The WordNet-LMF core package component provides the structural skeleton to represent the basic hierarchies of the lexicon.
KYOTO WordNets are represented as a grid of lexicons: LexicalResource is the container for all of them. A specific set of lexical objects GlobalInformation is devoted to record general information about the lexical resource.
The lexical resource besides the monolingual lexicons contains the interlingual correspondences which are grouped in a section (SenseAxes) that is separated from the lexicons proper and contains only inter-lexicon correspondences.
Lexicon contains a monolingual resource, instantiated as a set of LexicalEntry instances.
This element is a container for representing a lexeme in a lexicon. A LexicalEntry element contains the basic building blocks: lemma and senses. The element Meta is used to encode administrative information.
Lemma represents a word form chosen by convention to designate the lexical entry, whereas Sense represents one meaning of a lexical entry. For wordnet representation, this triplet is used to represent the variant(s), or literal(s) of a synset.
MonolingualExternalRef represents linking between a Sense or Synset and another resource, be it a knowledge organisation system, a database, or another lexical resource. Mapping among different versions of the same resource, reference to external information, such as mapping onto entries of another lexical database and or referencing additional sources can be dealt with by the MonolingualExternalRef object.
When linked to a Sense element, it can be used to express mapping between the sense and its correspondent in another lexical resource (such as in the Dutch Cornetto database). In the particular case of the representation of English Princeton WordNet, MonolingualExternalRef serves as a representational device to express the SenseKey. When linked to the Synset element, then MonolingualExternalRef allows to encode reference to the domain and/or one or more links to an ontological system.
The WordNet-LMF Semantic Component
The Semantic component is in charge of describing information about a wordnet synset by means of the Synset element.
A Synset clusters senses of different LexicalEntry instances within the same part of speech. The element Definition allows to represent the gloss associated with each synset.
Relations between synsets are codified by means of SynsetRelation elements (represented by means of XML attributes), one per relation.
A set of harmonized KYOTO Data Categories has been defined. This is to be used in conjunction with the SynsetRelation elements for representing the various relations holding between synset. This Data Category library (Appendix B), for the sake of coherence, is being maintained as a centralized repository. This option has been followed in order to enforce better compatibility and interoperability across the many monolingual wordnets.
MonolingualExternalRef, which is used to represent linking between the lexical resource and another resource, when linked to the Synset element, allows to encode reference to the domain and/or one or more links to an ontological system.
The WordNet-LMF Multilingual Component
The Multilingual notation component is used in KYOTO for expressing interlingual correspondences. This component is designed as an independent package in order not to overload the representation of monolingual lexicons. The model is based on the notion of “Axes” that link synsets pertaining to different languages. For the purposes of creating a grid of WordNets linked via Interlingual Index, the SenseAxis device is specifically suited to implement approaches based on an interlingual pivot. Any SenseAxis element groups together monolingual synsets that correspond one to another by means of a particular type of relation.
The SenseAxis element is a means for grouping together synsets belonging to different monolingual wordnets that correspond one to another and share the same equivalence relation (e.g. a synonymy or near_synonymy relation) to a pivot synset, which by convention is an English one. This is a compact way of encoding correspondences among wordnets, avoiding to have several LanguageX-English single correspondences.
InterlingualExternalRef is used in KYOTO-LMF to express a linking between a SenseAxis instance and an external system such as an ontology, and represents the means to anchor a multilingual group of synsets to an ontological node. Its intended use, thus, is to provide a representational device to link a group of synsets from different wordnets to the same ontological concept.