A fruit baring explanation about apples and pears

This blog is about taxonomies but besides the important part of knowing what the word taxonomy means it is equally important to know what it is not. I present a collection of concepts from the same jargon. Terms that are sometimes used in IT.

  • Taxonomy
  • Typology
  • Folksonomy
  • Thesaurus
  • Lemma
  • Ontology
  • Canonical model


A word that originates from Greek. A combination of (taxa) concepts like ordering, arrangement along with (nómos) words like use, rules and law The science of arranging individuals or objects into groups (taxa, or the single term taxon).
The term taxonomy can be used for both the method of arranging concepts as for the hierarchical ordering that is the result of the process. Such a hierarchical structure or ordering and the activity to get to such an ordering is called classification. Almost everything can be organized or structured in a taxonomy: life and living organisms, tools, goods, all kind of things, books, topography, administrative structures, events etc.

Taxonomy in technology

In computer science, the need arises for more and more common terminology to be used in systems and databases, including for the purpose of integration of data from various systems and for the unique exchange of product data, such as e-business systems and knowledge-driven designs. To enable this, use is made of standardized definitions of concepts, where the terms are arranged in a subtype-supertype hierarchy or taxonomy. This structure, among other great advantage that properties of super-types are inherited by subtypes.

In recent years, in the fields of computer science and artificial intelligence, attempts are make to create and maintain taxonomy from a set of concepts. An example is the automatically classification of a group of documents, for example, digital libraries. It is remarkable that in this field, a distinction is made between a taxonomy and typology. The difference is mainly in the way in which the classification is established. In a taxonomy you arrange a group of sample objects by dividing them. Next step is to observe what characteristics a concept has and you place it in a hierarchy by use of overarching features. This process shapes the taxonomy.

In a typology one starts from the concept. One considers that distinctive characteristics might normally have any objects, and then proceeds to classify the actual objects in accordance with these rules. One could say that taxonomies empirical (inductive) are established, and conceptual typologies (deductive).


A typology (in general) is a subdivision of a group of persons, descriptions, objects based on a number of characteristics. E.g. The Dutch cities can be divided by province or county (like cities in Limburg, Holland or cities in Noord Brabant…) according to population. Cities with over 500.000 inhabitants, cities with a population of 250.000 – 500.000 or other combinations.
Most groups of object scan be classified in many ways. Some typologies however are considered better than other. A typology with empty categories (e.g. cities in Limburg with more than 500.000 inhabitants) can be considered a weak typology. On the other hand, to many objects in a category also provides in a poor typology.

The terms, typology, classification system and taxonomy can be considered synonymous. In the domains of psychology, computer science/ artificial intelligence the distinction between these terms is made. The difference is to be found in the way they are created; taxonomy (empirical) or typology (conceptual).

It is possible that concepts that are related in a typology have no relation in a taxonomy. Let’s say if you define a typology of things you take along as a gift for a visit of a sick colleague than you expect to find concepts as apples, pears, flowers and crossword puzzle magazine.

It’s not likely that you find those concepts combined in a taxonomy.


A folksonomy is a system in which users apply public tags to online items, typically to aid them in re-finding those items. This practice is also known as collaborative/ social tagging, social classification or social indexing.

Folksonomy (when it was “invented”) was originally “the result of personal free tagging of information for one’s own retrieval. The borderline between folksonomy and social tagging (tags in an open online environment where the tags of other users are available to others) is becoming vague. Folksonomy is commonly used in cooperative and collaborative projects such as research, content repositories, and social bookmarking.

The term folksonomy is a mix of the words folk and taxonomy.

If you define taxonomy as a way of managed metadata folksonomy is the opposite it is just a container of terms with no ordering but if you can derive the use of each term you can find meaningful terms for an organization and if you monitor the folksonomy you can promote words to the taxonomies.


  • Twitter hashtags
  • Instagram
  • WordPress

In many features like Blue Kiwi, SharePoint etc. folksonomies can be presented in tag clouds. Showing significance and use by size of display.


In the classical sense a thesaurus is a kind of reference. A thesaurus is used to find the exact word for an object, a certain technical term or a word with the desired connotation (style considerations).

In modern times it is a tool through which unique concepts are linked by hierarchical equivalent and associative relationships. The term comes from the Greek and means treasure. It was initially established in linguistics as a logical-systematic (and alphabetically, but not explanatory) dictionaries: the concepts of language were categorized and compared to related concepts:

  • Synonyms; words that have a similar meaning. Sometimes people use the term data dictionary as a synonym for thesaurus
  • Hypernyms; words that describe a broader concept. Lexicon has a wider meaning than thesaurus.
  • Hyponyms; words that have a narrower meaning. Synonyms list has a narrower meaning than thesaurus,
  • Antonyms; words with the opposite meaning.

The term “thesaurus” is also used for a reference book with a specialized vocabulary within a particular interest- or profession, such as medicine or music. With the aid of a thesaurus the catalog of a library, for example, make it more accessible than by means of an arrangement, which in the end is arbitrary.

For categorizing and reference one is not strictly bound by the terms (and the language) of a book or other media such as video or sound that contains no text or metadata.

A thesaurus can even assign multiple terms per publication or item of information.

Lemma (desk book)

A lemma (lemmas- multiple) or key word is a word used to describe a concept in a dictionary or an encyclopedia as a search word. Often a lemma is the first, often strong emphasized word in an article in a dictionary or encyclopedia. To find a concept, the lemma must be known.

In electronic search the lemma is typed in the search field.

You can find the word “searched” as a variation on the lemma of “search”. A lemma starts with an introduction or a short description to put the word into context. Fragments of text can contain multiple hyperlinks leading to a lemma.


In computer science and logic an ontology is the result of an attempt to define a complete and strictly conceptual scheme on a certain topic or domain. The word ontology is a term used in philosophy.

An ontology is typically a data structure, describing all relevant entities and their relations within the rules of the domain. In the field of artificial intelligence, the concept of ontology is used to describe the ‘real world’ in a way that a computer can comprehend. Another way to describe it is: a knowledge representation.

In a semantic web a computer needs to derive the meaning of either text or metadata from a model and based on that information it can calculate reasoning, effect or conclusion.

An ontology is used as a strict and complete model for a certain domain, mostly in a hierarchical structure, containing all relevant units and their relations and the rules that these units and relations need to comply to.

canonical models

A term used in data modeling but which in itself is difficult to provide a definition.

Words that approach the concept

  • Typical
  • Normally, normalized
  • Unique, unambiguous
  • Standardized way of displaying
  • According to acknowledged, accepted rules

It is also an adjective meaning that the subject is in accordance with the canon, the rules (originally ecclesiastical laws). Canonical issues are so believable, and so is a canonical model.

Canonical used in information architecture

Information Architects often talk about canonical models that reality divides into concepts and relationships. A model makes reality visible. With a canonical model is a clear conceptual model designed based on a standardized and common approach to something in a particular context (a piece of reality) with the result.

  • Clarity
  • Standardization
  • Common look
  • Context

A canonical model is unambiguous and therefore only explain one way. The meanings of the concepts in the model are based on a commonly agreed standard. Think of a typical description of a car. A car is a very complex thing, but following the model of “car” is quite universal.

The model brings the complexity of car back to some key concepts related to each other. A typical car has a body, an engine, a steering wheel, a front axle with two wheels and a rear axle with two wheels. The steering wheel is connected with the front axle, and the motor drives one of the shafts, or both at the same time. This model typifies a car. Every car meets this model. Indeed, tricycles do not, so the model is not universal, but within the context of a car maker which produces only four-wheeler vehicles.

A canonical model simplifies communication about things in a particular context (eg a company). Anyone within that context that the model does know what is meant when the concepts are discussed in this model. It prevents, said quite simple misunderstandings. The model is, after all, unequivocally.

How to get to these instruments.

Obviously it is very ambitious to create models that are accepted by all as credible. Since people have different views on the same issues. Each in its own perception. If the need is there, one model can form a good foundation for information architecture and has many advantages in IT architecture are once interfaces now. That of course is great, but how do you develop a model where everyone in a company it is agreed? This is not easy.

No common IT discipline; involve professionals.

Where more than one person, because the phenomenon occurs that people have a different view of things. Often conflict the “images”, and requires one watch a lot of detail, and the others are low.
If an organization is committed to instruments as described above it is important to consult experts such as:

  • Library scientists
  • linguists
  • Information technology expert with a penchant for the conceptual field of classification are preferred to professionals who have moved to the manufacturing side of the IT (e.g. developers).

The application of these techniques requests subject matter experts from within the organization. In this manner, one can set up a first version and present them as a starting point. Do not go all together to argue otherwise it never comes to anything. Accept expertise as you also accept the authority of example, an oncologist. Sometimes you have to prevent something going proliferate to keep it valuable.

What about SharePoint?

term store

Taxonomy, thesaurus and are directly reflected in SharePoint. The term store is a tool that term sets (taxonomies) are housed and where thesaurus functionality is possible. The setup of SharePoint offers a central term store at farm level or tenant. The site automatically inherit collections that are within the scope of this term store the termset of this central facility. Before you go up some termset or a series of term sets is wise to have purchased more than overnight to go over it.
Think first and then act.
The transfer of taxa from the specific (local term store) to a generic (or tenant farm level) is difficult.

Central (generic) or local (specific)

Each site collection contains a term store. And at each collection site, it is possible to make specific term sets. Realize the same time that specific term sets without customization are not accessible from other site collections.

Try as much as possible to work from a central but skip the dynamics not die immediately to arrange everything. A useful tool to bridge the setting option on the library’s “Enterprise Metadata and Keywords Settings” thus an opportunity is created to grant unsorted metadata. The results are shown in the term store and thereby provide a source to maintain the managed / graded metadata.

Relevant links

Handleiding Microsoft over termstore
Technet page about metadata navigation
Technet artikel over thesaurus
Cross site publishing en catalog feature metadata to the max