Welcome close

Not Just Search

the Importance of Metadata to Website Strategy
Aniel Bio Pic
By: Aniel Sud

It’s commonly accepted that metadata is “data about data,” and the definition is purposefully vague. However, the use of metadata as a fundamental component of a good website strategy is not. Metadata’s purpose is clearly defined for each aspect of a site with which it interacts from keywords to search to taxonomy and other advanced applications, such as microformats.

There are two basic forms of metadata important to website strategy: descriptive and structural. Descriptive metadata is used to characterize the information contained in one piece of content. Structural metadata creates relationships between multiple pieces of data and builds larger defined objects out of many smaller ones.

Descriptive metadata is typically used for bibliographic or tagging data. For instance, all the tagging done when using Flickr’s online photo organization and sharing service (http://www.flickr.com) is considered descriptive metadata. Site keywords are the original web-based descriptive metadata.

Descriptive metadata is really only concerned with one particular piece of content: what it is, where it was generated, who created it, what it depicts, etc. The Dublin Core[1] metadata standard is a widely accepted standard for creating descriptive metadata. Utilizing it to describe digital assets, such as video, images, audio and text, categorizes content and lets it be understood more accurately and easily by online services, including search engines.

Descriptive Metadata and Microformats

An emerging standard for descriptive metadata is microformats. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and adopted standards.[2] They add structure to common data types, such as events/calendars, reviews, recipes, directions and contact information, on web pages, so that the information in them can be extracted by software and indexed, searched for, saved, cross-referenced or combined. With microformats, data is both structured and web friendly at once.

For example, the ‘hCard’ microformat for contact information contains elements for name, organization and URL. By utilizing the microformat, the data retains its structure as it is parsed by search engines and other web-based automata. While most search engines are passable at inferring information structure from document layout, the precision of searches increases dramatically when structure is known.

 

<div class=”vcard”>
 <a class=”url fn” href=”http://www.JPEngineering.com/”>John Public</a>
 <div class=”org”>JP Engineering</div>
 <div class=”adr”>
<span class=”type”>work</span> address
<div class=”street-address”>432 Electric Avenue </div>
<span class=”locality”>Amherst</span>
<span class=”region”>NH</span>
<span class=”postal-code”>03031</span>
<div class=”country-name”>U.S.A.</div>
 </div>
 <div class=”tel”>
  <span class=”value”>+1-555-555-1212</span>
</div>
 <a class=”email” href=”mailto:J.Public@JPEngineering.com”>email</a>
</div>
Figure 1: Microformat hCard example

 

The idea behind a microformat is to supply descriptive metadata without interfering with the content’s presentation layer or markup. Instead of throwing away what works with average Internet users, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns. This results in code that lay persons can follow, yet all the data is structurally comprehensible to machines, making it far more likely to succeed where other standards have failed.

The relationship between this structural integrity and an accurate description of the data encapsulated on a website is key. Since its inception, the Web has been about interconnectedness. From hyperlinking on, it has been focused on bringing disparate information sources together in more and more unified ways. This is why metadata is so important. The metadata embedded in web content makes it much more understandable to all types of applications.

Structural Metadata and Taxonomy

Structural metadata is closely related to descriptive metadata, but structural’s purpose is to convey the relationship between pieces of data. The concept of taxonomy is solidly within this arena, but let’s back up for a moment.

The need to organize data has been a staple of the computer science industry since its inception. In fact, since Aristotle in ancient Greece, reason has been searching for ever more convenient and accurate methodologies for understanding the world by understanding the relationships between objects in the world. This is the concept of ontology.

For more than a decade, metadata has been a way of typifying objects and creating ontology around them. In most web content management systems (CMS), metadata establishes a relationship between a piece of content and categories into which it falls. For example, some data in a CMS can be public and some private. Based on the website’s information architecture, a specific property may be needed to set whether data is public or private. Often that property is set in metadata associated with each data object in the system.

A website’s information architecture is designed to provide useful ways to navigate its content, and structural metadata can enhance the architecture through granular content organization. Adding something as simple as a Boolean metadata value to the public or private value of each piece of content in a CMS allows for secondary methods of navigation. For a single directory-based tree structure, there is not only one, but three ways of navigating the data:

 

  • The tree containing only public data
  • The tree containing only private data
  • The tree containing both

 

This becomes more interesting to good website strategy when taken to the next level with taxonomy.

Taxonomy evolves structural metadata and removes the limitations exemplified by the three views above. Instead, it offers a tree type layout. A taxonomy can effectively create an entirely separate organizational structure for data as defined by a website administrator. A single piece of content can exist in multiple locations on the taxonomy tree, and data object hierarchies can be completely unrelated to each other.

A classic example of how it works uses former president and movie actor Ronald Reagan. In the single directory-based tree structure, content about Reagan might only appear in a ‘Presidents’ folder. To also include information in an ‘Actors’ folder, you would have to create a separate entry. Using a CMS’ built-in taxonomy, a content author can mark Reagan-related information for both categories. The taxonomy structure might look something like this:

 

Figure 2: Sample Taxonomy

 

The same Reagan data can now be useful for different areas of the website since its taxonomy ensures it’s relevant to both categories.

Taxonomy is a powerful tool for any sort of knowledgebase. The primary function of taxonomy is to make it easier to find data quickly either through search or by browsing. Once that type of tagging is in a web CMS, site search can be narrowed from a wide area down to right taxonomy vocabulary and into specific content.

Taxonomy can also be employed on the front-end of a website in building a menu system. Visitors can come to a site and intuitively navigate to the desired level of detail by clicking on the right vocabulary to view all applicable content in that location.

Folksonomy and User-Generated Content

As part of the Web 2.0 explosion of community-oriented collaboration tools and services, the concept of folksonomy is gaining acceptance. A folksonomy is essentially a build-up of free form structural metadata. Rather than using a vocabulary pre-defined by some system administrator, folksonomy is unregulated. It operates best with user-generated content in which the users themselves tag content with whatever keywords or categories they deem related. YouTube (http://www.youtube.com), Google’s online video sharing site, and Del.icio.us (http://del.icio.us), a social bookmarking site, rely on folksonomy to create a usable navigation alternative to basic content indexing.

When a user tags a piece of content, he creates a category that contains a pointer to that data. Other users may also tag their data with the same category, and all those items become searchable within that category. This differs from a traditional taxonomy in that there is no single group or person crafting the information architecture or a controlled vocabulary for the data. The folksonomy is instead generated on the fly by what the users feel is important in describing the content. This system is especially responsive in creating metadata that describes content in the users’ common vernacular. They don’t have to learn a specific vocabulary, but the tradeoff is information may become inaccessible as certain keywords fall out of favor. Content may no longer be returned in any meaningful query, and there is little to no inherent knowledge in the system about relationships between tags.

Tag clustering[3] helps overcome this weakness. The concept behind tag clustering is that by analyzing the tags related to a piece of content, one can discern how closely the tags themselves are related. This is incapable of generating a rigid hierarchy of relationships, but it is perfect for creating a cloud space with nodes, not as individual pieces of content, but as tags.

 

Figure 3: Clustering in a Tag Space

The outcome is that you can have a dynamic navigation system that responds to users’ needs directly, rather than through human intervention. This cloud space would then be completely fluid and allow for complex behaviors in the system. Through trend analysis, the system could interpret which pages are relevant to both an individual user, as well as to more general groups. Items that many users find interesting are displayed more prominent for all site visitors or custom user navigation paths are developed based on a visitor’s browsing habits.

For example, a site visitor is navigating a paper manufacturing company’s product catalog. The user consistently looks at products that are tagged “Card Stock.” A website using tag clustering would show a navigation tree that displays items tagged with “Card Stock” more prominently. The system might also show products tagged with “Glossy Finish” if earlier visitors tagged certain paper products with both tags, as this dual tagging would inform the system of the relationship between the two.

The practical benefit of an application of a system like this would be enhanced navigation capabilities, as well as increased accuracy in searches. It translates into an improved user experience with higher satisfaction levels and an increased opportunity for upselling the company’s paper products.

Practical Usage of Metadata

Using a combination of structural and descriptive metadata means web content can be easily discovered and understood by both internal and external search engines, as well as services like workflow managers. It also means site navigation needs are more easily accommodated, and the end-user experience is largely improved. With the introduction of microformats, the bar for implementing and deploying descriptive metadata is significantly lowered.

Through the use of structural metadata, there are large gains to be realized in internal content management as well. As most organizations with large websites discover, the simple directory structure is not an effective mechanism for finding and managing assets. While structural metadata is not a suitable replacement for traditional search using indexing, coupling traditional search with structural tools, such as taxonomy, can significantly ease the difficulty of finding and maintaining a website’s data.

As wiki and syndication applications grow in popularity, metadata structures such as tagging and taxonomy can also provide for a more efficient development cycle. Since the additional metadata that is supplied is usable to new applications, the content itself becomes more portable. In emerging mash-up applications, the metadata can easily be used in new and novel ways, giving more insight into the meaning behind the data. A mapping tool, for example, could use address metadata to find latitude and longitude coordinates for a retail chain’s store locations. The coordinates can then be plotted on map and displayed to a site visitor looking for his nearest outlet.

In the here and now, the emergence of these new metadata tools and standards means the content on a given website is more readily usable. Search engines will rank it higher, customers and employees will find what they need more easily through enhanced navigation, and visitors will interact more efficiently with the site, for an overall improved user experience.

 

[1] Dublin Core Metadata Initiative; 2006/12/18; DCMI: http://dublincore.org
[2] About microformats; 2006/12/29; Microformats.org: http://microformats.org/about
[3] Automated Tag Clustering: Improving search and exploration in the tag space; 2006/5/26; Begelman, Keller, and Smadja: http://www.rawsugar.com/www2006/20.pdf
SEARCH : searchSite
This SectionSite Wide
Articles

Not Just Search

the Importance of Metadata to Website Strategy
Aniel Bio Pic
By: Aniel Sud

It’s commonly accepted that metadata is “data about data,” and the definition is purposefully vague. However, the use of metadata as a fundamental component of a good website strategy is not. Metadata’s purpose is clearly defined for each aspect of a site with which it interacts from keywords to search to taxonomy and other advanced applications, such as microformats.

There are two basic forms of metadata important to website strategy: descriptive and structural. Descriptive metadata is used to characterize the information contained in one piece of content. Structural metadata creates relationships between multiple pieces of data and builds larger defined objects out of many smaller ones.

Descriptive metadata is typically used for bibliographic or tagging data. For instance, all the tagging done when using Flickr’s online photo organization and sharing service (http://www.flickr.com) is considered descriptive metadata. Site keywords are the original web-based descriptive metadata.

Descriptive metadata is really only concerned with one particular piece of content: what it is, where it was generated, who created it, what it depicts, etc. The Dublin Core[1] metadata standard is a widely accepted standard for creating descriptive metadata. Utilizing it to describe digital assets, such as video, images, audio and text, categorizes content and lets it be understood more accurately and easily by online services, including search engines.

Descriptive Metadata and Microformats

An emerging standard for descriptive metadata is microformats. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and adopted standards.[2] They add structure to common data types, such as events/calendars, reviews, recipes, directions and contact information, on web pages, so that the information in them can be extracted by software and indexed, searched for, saved, cross-referenced or combined. With microformats, data is both structured and web friendly at once.

For example, the ‘hCard’ microformat for contact information contains elements for name, organization and URL. By utilizing the microformat, the data retains its structure as it is parsed by search engines and other web-based automata. While most search engines are passable at inferring information structure from document layout, the precision of searches increases dramatically when structure is known.

 

<div class=”vcard”>
 <a class=”url fn” href=”http://www.JPEngineering.com/”>John Public</a>
 <div class=”org”>JP Engineering</div>
 <div class=”adr”>
<span class=”type”>work</span> address
<div class=”street-address”>432 Electric Avenue </div>
<span class=”locality”>Amherst</span>
<span class=”region”>NH</span>
<span class=”postal-code”>03031</span>
<div class=”country-name”>U.S.A.</div>
 </div>
 <div class=”tel”>
  <span class=”value”>+1-555-555-1212</span>
</div>
 <a class=”email” href=”mailto:J.Public@JPEngineering.com”>email</a>
</div>
Figure 1: Microformat hCard example

 

The idea behind a microformat is to supply descriptive metadata without interfering with the content’s presentation layer or markup. Instead of throwing away what works with average Internet users, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns. This results in code that lay persons can follow, yet all the data is structurally comprehensible to machines, making it far more likely to succeed where other standards have failed.

The relationship between this structural integrity and an accurate description of the data encapsulated on a website is key. Since its inception, the Web has been about interconnectedness. From hyperlinking on, it has been focused on bringing disparate information sources together in more and more unified ways. This is why metadata is so important. The metadata embedded in web content makes it much more understandable to all types of applications.

Structural Metadata and Taxonomy

Structural metadata is closely related to descriptive metadata, but structural’s purpose is to convey the relationship between pieces of data. The concept of taxonomy is solidly within this arena, but let’s back up for a moment.

The need to organize data has been a staple of the computer science industry since its inception. In fact, since Aristotle in ancient Greece, reason has been searching for ever more convenient and accurate methodologies for understanding the world by understanding the relationships between objects in the world. This is the concept of ontology.

For more than a decade, metadata has been a way of typifying objects and creating ontology around them. In most web content management systems (CMS), metadata establishes a relationship between a piece of content and categories into which it falls. For example, some data in a CMS can be public and some private. Based on the website’s information architecture, a specific property may be needed to set whether data is public or private. Often that property is set in metadata associated with each data object in the system.

A website’s information architecture is designed to provide useful ways to navigate its content, and structural metadata can enhance the architecture through granular content organization. Adding something as simple as a Boolean metadata value to the public or private value of each piece of content in a CMS allows for secondary methods of navigation. For a single directory-based tree structure, there is not only one, but three ways of navigating the data:

 

  • The tree containing only public data
  • The tree containing only private data
  • The tree containing both

 

This becomes more interesting to good website strategy when taken to the next level with taxonomy.

Taxonomy evolves structural metadata and removes the limitations exemplified by the three views above. Instead, it offers a tree type layout. A taxonomy can effectively create an entirely separate organizational structure for data as defined by a website administrator. A single piece of content can exist in multiple locations on the taxonomy tree, and data object hierarchies can be completely unrelated to each other.

A classic example of how it works uses former president and movie actor Ronald Reagan. In the single directory-based tree structure, content about Reagan might only appear in a ‘Presidents’ folder. To also include information in an ‘Actors’ folder, you would have to create a separate entry. Using a CMS’ built-in taxonomy, a content author can mark Reagan-related information for both categories. The taxonomy structure might look something like this:

 

Figure 2: Sample Taxonomy

 

The same Reagan data can now be useful for different areas of the website since its taxonomy ensures it’s relevant to both categories.

Taxonomy is a powerful tool for any sort of knowledgebase. The primary function of taxonomy is to make it easier to find data quickly either through search or by browsing. Once that type of tagging is in a web CMS, site search can be narrowed from a wide area down to right taxonomy vocabulary and into specific content.

Taxonomy can also be employed on the front-end of a website in building a menu system. Visitors can come to a site and intuitively navigate to the desired level of detail by clicking on the right vocabulary to view all applicable content in that location.

Folksonomy and User-Generated Content

As part of the Web 2.0 explosion of community-oriented collaboration tools and services, the concept of folksonomy is gaining acceptance. A folksonomy is essentially a build-up of free form structural metadata. Rather than using a vocabulary pre-defined by some system administrator, folksonomy is unregulated. It operates best with user-generated content in which the users themselves tag content with whatever keywords or categories they deem related. YouTube (http://www.youtube.com), Google’s online video sharing site, and Del.icio.us (http://del.icio.us), a social bookmarking site, rely on folksonomy to create a usable navigation alternative to basic content indexing.

When a user tags a piece of content, he creates a category that contains a pointer to that data. Other users may also tag their data with the same category, and all those items become searchable within that category. This differs from a traditional taxonomy in that there is no single group or person crafting the information architecture or a controlled vocabulary for the data. The folksonomy is instead generated on the fly by what the users feel is important in describing the content. This system is especially responsive in creating metadata that describes content in the users’ common vernacular. They don’t have to learn a specific vocabulary, but the tradeoff is information may become inaccessible as certain keywords fall out of favor. Content may no longer be returned in any meaningful query, and there is little to no inherent knowledge in the system about relationships between tags.

Tag clustering[3] helps overcome this weakness. The concept behind tag clustering is that by analyzing the tags related to a piece of content, one can discern how closely the tags themselves are related. This is incapable of generating a rigid hierarchy of relationships, but it is perfect for creating a cloud space with nodes, not as individual pieces of content, but as tags.

 

Figure 3: Clustering in a Tag Space

The outcome is that you can have a dynamic navigation system that responds to users’ needs directly, rather than through human intervention. This cloud space would then be completely fluid and allow for complex behaviors in the system. Through trend analysis, the system could interpret which pages are relevant to both an individual user, as well as to more general groups. Items that many users find interesting are displayed more prominent for all site visitors or custom user navigation paths are developed based on a visitor’s browsing habits.

For example, a site visitor is navigating a paper manufacturing company’s product catalog. The user consistently looks at products that are tagged “Card Stock.” A website using tag clustering would show a navigation tree that displays items tagged with “Card Stock” more prominently. The system might also show products tagged with “Glossy Finish” if earlier visitors tagged certain paper products with both tags, as this dual tagging would inform the system of the relationship between the two.

The practical benefit of an application of a system like this would be enhanced navigation capabilities, as well as increased accuracy in searches. It translates into an improved user experience with higher satisfaction levels and an increased opportunity for upselling the company’s paper products.

Practical Usage of Metadata

Using a combination of structural and descriptive metadata means web content can be easily discovered and understood by both internal and external search engines, as well as services like workflow managers. It also means site navigation needs are more easily accommodated, and the end-user experience is largely improved. With the introduction of microformats, the bar for implementing and deploying descriptive metadata is significantly lowered.

Through the use of structural metadata, there are large gains to be realized in internal content management as well. As most organizations with large websites discover, the simple directory structure is not an effective mechanism for finding and managing assets. While structural metadata is not a suitable replacement for traditional search using indexing, coupling traditional search with structural tools, such as taxonomy, can significantly ease the difficulty of finding and maintaining a website’s data.

As wiki and syndication applications grow in popularity, metadata structures such as tagging and taxonomy can also provide for a more efficient development cycle. Since the additional metadata that is supplied is usable to new applications, the content itself becomes more portable. In emerging mash-up applications, the metadata can easily be used in new and novel ways, giving more insight into the meaning behind the data. A mapping tool, for example, could use address metadata to find latitude and longitude coordinates for a retail chain’s store locations. The coordinates can then be plotted on map and displayed to a site visitor looking for his nearest outlet.

In the here and now, the emergence of these new metadata tools and standards means the content on a given website is more readily usable. Search engines will rank it higher, customers and employees will find what they need more easily through enhanced navigation, and visitors will interact more efficiently with the site, for an overall improved user experience.

 

[1] Dublin Core Metadata Initiative; 2006/12/18; DCMI: http://dublincore.org
[2] About microformats; 2006/12/29; Microformats.org: http://microformats.org/about
[3] Automated Tag Clustering: Improving search and exploration in the tag space; 2006/5/26; Begelman, Keller, and Smadja: http://www.rawsugar.com/www2006/20.pdf

Articles
Design without Borders, yet with Structure
CMS400.NET and Adobe Flex
GeoMapping
EkML Ektron Markup Language
Taxonomy

[First] [Previous] [Next] [Last]


See All Articles