By
Albert Ip (Contact author) 1, Mike Currie2, Iain Morrison3, Jon Mason4
1 EdNA (Higher Education) Technical Specialist, email: albert@dls.au.com
2 EdNA Project Officer (Higher Education), email: m.currie@dis.unimelb.edu.au
3 Head, Department of Information System, The University of Melbourne, email: i.morrison@dis.unimelb.edu.au
4 Senior Consultant, Education.Au Limited, email: jmason@educationau.edu.au
This paper takes a fresh look at resource discovery on the Web using a new information and data model.
After classifying web sites into Resource sites and Search sites, the authors review some of the issues surrounding conventional Web searching and outline three data types which embrace, in broad terms: content, metadata and indexes, and classification rules and standards. This Data Model is then applied to provide new insights into the resource discovery process.
The paper articulates the role of the elements of the Data Model in helping optimise the search process especially in the case of Subject Gateways.
The relationship between metadata and the data model is explored with particular attention to the Dublin Core metadata standard and a two tier searching strategy is proposed which is aimed at preserving the knowledge specific specialties of different Subject Gateways.
The invention of the World Wide Web and proliferation of website creation tools has led to an exponential growth of web-enabled resources. We have moved from a printed environment governed by a scarcity of resources to a situation whereby the quality of online resources is submerged by the quantity of the information available. The task of accessing, organising and making available data on web resources has been handled with varying success by commercial search engines and indexing services. However the challenges involved in indexing new sites and maintaining existing links together with the increasing abuse of keyword searching has rendered these increasingly inadequate for research and learning.
A better understanding of the nature of web resources and their relationship with ‘Search Sites’ can provide new perspectives on search technology and lead to possible solutions to the challenge of online information discovery faced by researchers.
The following theoretical model describes how online resource databases operate and illustrates how current best practices of 'Search Sites' (key web sites established for resource discovery purposes) fit this model.
In broad terms, there are two kinds of websites which support resource discovery and which operate interdependently -- Search Sites and Resource Sites. This distinction can be amplified by drawing on an analogy with a physical library.
Resource Sites are like the collections of a library. They host material which users are interested in such as HTML documents, software objects, course material, resource materials and so on. Unlike a library, each Resource Site may host a limited range of resources as dictated by the interest(s) of the site owners. There is no central body coordinating the hosting of resources and there is no mechanism of quality control that validates the resources. This free 'democratic' activity on the Web is enabled through the easy creation of HTML pages. The Web acts as a huge library with distributed holdings scattered throughout the Internet and managed by individuals without central coordination. However, very much unlike a library, there is no central catalogue!
Another analogy can be drawn here. In a large physical library, finding a particular book typically happens by locating it first in the library catalogue or by knowing the Dewey Number of the book. It can also happen through serendipity, by finding books shelved together. However, on the Web this concept of "nearness" on the Web in terms of information proximity has not been adequately captured.
Hence Search Sites exist to meet this challenge. In many ways, Search Sites can be likened to the catalogue of libraries. However there are major differences:
Search Sites require mechanisms for locating resources. Some depend on manual discovery of resources and sometimes encourage users to suggest resources. Others send out robots, 'spiders' or 'crawlers', to collect resources by following hyperlinks on documents. These robots retrieve Web resources (referred to below as Type 1 data) for further processing.
While there exist projects which endeavour to store a snapshot of the current available Web resources, it is generally impossible for Search Sites to actually host all 'Type 1' data. Search Sites create some form of data (referred to below as Type 2 data) from the gathered Type 1 data in order to support the last and most visible function of the Search Site. When a user submits search criteria, instead of going out into the Web to locate the material (methodically exploring all the Type 1 data options), Search Sites scan the Type 2 data created by their robot(s) in order to find the matching resource(s) and then send a report to the user. Type 2 data can be used to conserve computation and/or communication of Search Sites.

Figure 1: Logical structure of Search Sites
One particular type of search site is the Subject Gateway. This typically is concerned with resources that fit within a set of defined criteria such as subject area, originating organisation, geographic focus and language. These resources are then manually evaluated and referenced by the Gateway. This paper focuses particularly on the search strategies employed by Subject Gateways, however most of the findings should be valid for other types of Search Sites.
In this theoretical framework, focus is on the data which supports the provision of services for Search Sites. These data types are grouped in the following schema:
Type 1 Data: These are resources that end users are interested in. (In certain circumstances, Type 2 and Type 3 data are themselves Type 1 data). Type 1 Data may include lesson plans, teaching strategies, curriculum ideas, web pages, software, information data sources, dynamic data (such as stock price quotes, weather conditions), media (sound and video), software modules, or Computer Assisted Learning packages, etc.
Type 2 Data: These data are derived directly from Type 1 data and sometimes function as surrogate data for it (especially for non-text based Type 1 data) in order to conserve computation or cognitive load while performing the resource discovery function of the Search Sites. Examples include metadata and indexes of websites. The comprehensiveness of Type 2 data and the relevancy of the collection to the Sites' users are part of the primary asset of Search Sites.
Type 3 data: This is data that are not captured by Type 1 and Type 2 data, and which typically cannot be derived directly from a single Type 1 or Type 2 data. A little more explanation may help make this concept clear.
A book is a resource and hence it is Type 1 data. The catalog card describing the book is Type 2 data. Using the bibliography, we can locate catalogue cards for the referenced works. The grouping of this set of catalog cards is of a special value (that they are related). This is Type 3 data. Sometimes this Type 3 data is represented in a "relatedTo" metadata element.
Some Type 3 data describes "information proximity" between Type 1 data. Typical data falling into this class of Type 3 data are classification schemes such as the Library of Congress Subject Headings or the Dewey Decimal Classification System.
Some Type 3 data relates to the "perceived value" of Type 1 data by its users such as a "citation index" of academic papers, "popularity ratings" of web pages as recorded by usage logs in proxy servers, or the number of hyperlinks pointing to the resource.
Another class of Type 3 data relates to "standards", e.g., how Type 2 data should be expressed. Typical data of this class are metadata standards such as Dublin Core, IEEE LOM, etc.
One interesting characteristic of Type 3 data is that a single datum is of little value, just like a telephone. Only one telephone in the world is useless. The power of Type 3 data is the collection.
The initial discovery of a resource by a Search Site is referred to as "gathering". The gathering mechanism of Search Sites can be classified into two main approaches: automatic and human initiated. Automatic gathering mechanisms depend on the fact that Web pages are hyperlinked resources (the linkage is a Type 3 data). By following the hyperlinks of Type 1 data, a software robot is able to discover more Type 1 data, thereby bringing it into the Search Site where some other software process can be applied to extract the Type 2 data operated by that site. Alta Vista (www.altavista.com) follows this strategy. On the other hand, some Search Sites depend on human suggestion of sites coupled with some form of automatic strategy to expand their collection. EdNA (www.edna.edu.au), Yahoo! (www.yahoo.com) and About.com (formerly the Mining Company) (www.about.com) belong to this group.
However, many resources of interest to Subject Gateways are not text-based or may not have embedded hyperlinks, making the gathering process a real challenge. This reinforces the notion that the primary asset of such Search Sites is the Type 2 data collected.
This theoretical model enables an analysis of the relative value of Search Sites. It has been applied to suggest a model upon which an Australian national strategy for developing an improved type of Search Site, namely 'Teaching and Learning Resource Databases', can take place. [EIP report to DETYA, to be published]. In a later section, this theoretical model will be applied to pinpoint the difficulties faced by the metadata community and propose a partial solution to the problem of resource discovery. However, before pursuing this further it is worth looking at the functionality of some commercial Search Sites and EdNA Online.
For Subject Gateways, including EdNA online, the Type 2 data collected can be viewed as the major asset.
There are two main classes of Type 2 data: indexes and metadata. (EdNA uses both!)
Technically speaking, indexes are inverted text indexes of HTML (or text) pages. Search Sites typically produce a list of all the words found in a resource without the common or stop words and store them in computer readable format. When a user submits a keyword query, the keyword is matched against such word lists and the resources that contain the keyword are returned as a search result. This is the most widely adopted method of commercial Search Sites because, as discussed above, Resource Sites owners do not normally add their own descriptive keywords. However, some commercial Search Sites do respect keywords supplied by resource owners and this metadata has been used as first generation metadata for Search Sites. This has also created a small industry which exploits this crude implementation by abusing the function of keywords in order to make their pages appear at the beginning of search results.
Metadata is simply defined as data about data. "It describes the attributes and contents of an original document or work" (Milstead & Feldman, 1999). The DESIRE project describes metadata as "data associated with objects which relieves their potential users of having to have full advance knowledge of their existence and characteristics".
Metadata needs to conform to two constraints: semantics and syntax. It is generally agreed that any statement about a resource will have two parts: the nature of the statement and the value associated with the statement. For example, if we want to state the "creator" of the resource is "John Smith", the nature of the statement is the notion of "creator" and the value is "John Smith". Unfortunately, natural language usage typically involves wide variations in semantics and thus some people may prefer to use "author" or "writer" instead of "creator". To achieve interoperability, such variations must be controlled. The Dublin Core standard, for example, defines elements (or 'containers' for descriptive content about a primary data source) which constrain the semantics associated with allowed content, some of which derive values constrained by a 'controlled vocabulary'.
With the semantics well defined, it is necessary to link the metadata to the Type 1 data by either embedding it into the resource (in case it is text-based) or associating it in a separate repository as detached metadata (utilised for both text and non-text). The syntax is also important to enable exchange. Well known standards include the use of the <META> tag in HTML 3.2 and HTML 4.0 documents. Recently, the W3C consortium has formally recommended the Resource Description Framework (RDF) as the model and syntax for metadata for machine interoperability.
As an internationally recognised standard developed for online resource discovery, the Dublin Core metadata standard has largely been driven by experts in the library and information technology communities. Dublin Core metadata consists of a set of 15 elements, some of which are ‘qualified’ by ‘schemes’ or qualifiers. The semantic meanings of these 15 elements are well defined and some of elements accept values only from a ‘controlled vocabulary’. Each element is optional and repeatable. Furthermore, metadata elements may appear in any order, and with no significance being attached to that order. A metadata element’s meaning is unaffected by whether or not the element is embedded in the resource that it describes.
The DC metadata elements fall into three groups which roughly indicate the class or scope of information stored in them:
|
Content |
Intellectual Property |
Instantiation |
|
Title |
Creator |
Date |
|
Subject |
Publisher |
Type |
|
Description |
Contributor |
Format |
|
Source |
Rights |
Identifier |
|
Language |
||
|
Relation |
||
|
Coverage |
Table 3: Dublin Core Metadata Elements
Except for the elements DC.Subject and DC.Relation, other Dublin Core elements express a unique aspect of the resource without any relationship to other resources. That is, this reduced Dublin Core set is a pure Type 2 data and is referred to as "pure core elements" below.
DC.Subject and DC.Relation express information about the resource which cannot be viewed (or have little value when viewed) alone. The assignment of a value to DC.Subject implies the existence of other values in addition to the assigned value. Its usefulness improves greatly when a list of subjects is well defined, e.g., by applying LCSH on the DC.Subject element. The usefulness of the value assigned to DC.Relation will suffer when the online resource pointed to by this relationship changes e.g. due to aging or movement of resources. These and other elements that depend on Type 3 data relationships can be referred to as extended elements. Typical extended elements are those elements whose value will be constrained by controlled vocabularies (in this case, the controlled vocabulary list is a Type 3 data). However, some extended elements, such as annotations or commentary, can use free text as values.
For Subject Gateways (such as EdNA Online or MetaChem) there is a need to define further, finer grained elements which are relevant to the communities they support. Educational Subject Gateways will typically describe (or provide statements about) the pedagogy recommended by the resource owner, the expected level of the student using the resource and so on. Such information is not simply a description of the original resource but rather an evaluation of it.
To aid inter-operability and enable machine understanding, the value assigned to such elements may need to be constrained, and defined on a different semantic level.
The specifications for a metadata standard (e.g., Dublin Core), the controlled keyword (e.g., a thesaurus), and the categorisation or classification rules (in EdNA Online) are Type 3 data which, when applied appropriately on Type 2 data, can provide better search service. However, while Type 3 data is useful in its own right by enhancing the service of Search Sites, agreement is required to enable both the creation of Type 2 data as well as interoperability, exchange of information and enabling of megasearching.
An increasing number of Search Gateways are designed to search across multiple Search Sites A good example is MetaCrawler which
simultaneously does parallel queries of nine Search Sites (including Yahoo!, Alta Vista, Excite, Infoseek and Lycos). It is unfortunate that it called itself MetaCrawler and hence the term metasearch was commonly understood as online searching across multiple search engines.
This paper has proposed that the term ‘metasearching’ refer to search engines that utilise metadata. Search engines such as MetaCrawler are better referred to as 'megasearch' engines. We propose that the term megasearching be used to describe searching across multiple search engines including, for example HotOil from Distributed Systems Technology Centre (DSTC) which searches across multiple engines as well as other protocols, such as Z39.50. Other similar megasearch sites include Dogpile, Copernic, Webferret, Mamma, All-In-One Seach Page, MetaFind, Inference Find, OneSeek, SavvySearch and FindIt.
Megasearch Sites, or Search Gateways, submit the search request to different underlying search engines. They then produce a combined list of results by equalizing the score and eliminating duplicates to provide a single result list. However, because each Search Site may use different search algorithms, the current generation of Search Gateways are generally unable to make meaningful comparisons of the rankings returned from different search engines making the final result set generally too large to be useful.
The concept of ‘megasearch’ still has great potential for Subject Gateways and EdNA Online. As Subject Gateway owners specialise in their special area of interest (subject domain, pedagogical paradigm, instructional design methodologies), a novice user will need a megasearch site to find information.
The main enabling conditions of useful megasearching will be standardisation in rating standards of returned result sets, inter-operable Type 3 data for classification of content, and standard syntax for exchange of result sets. The W3C held a Query Language Workshop in late 1998 to address these issues. The participants' position statements can be found at http://www.w3.org/TandS/QL/QL98/pp.html. Before that happens, we see that it is still possible for Megasearch Sites to provide improved service by mapping multiple metadata standards and schema, using keyword thesauri and mapping different natural languages to pass queries to relevant Search Sites.
Before the wide adoption of providing metadata by Resource Sites, the primary asset of a Search Site has been indexes (a form of Type 2 data) that are gathered over the operation of the site. The quality of the results returned to end users depends largely on the characteristics of the Type 2 data owned by the Search Sites, e.g., how Type 2 data has been created and has been used. As discussed before, many commercial sites are exploiting some form of Type 3 data in order to improve the ratings of the result, the relevance of the result or the authoritative level of the result in order to assist the user to make the final selection of the resource. This is done mainly in the absence of a widespread adoption of metadata and Yahoo! has to create surrogate records manually instead of depending on Resource Sites to provide the metadata.
There is no doubt that different Subject Gateways meet different user requirements and hence there is an obvious need for diversity and niche services. It has ready been established above that a reasonably good, reliable Type 2 data to be used by Subject Gateways should be metadata (in contrast to the inverted index) and sometimes the only form of Type 2 data if non-text based material is included in the Subject Gateways repository.
On the other hand, the owner of each Subject Gateway is able to attach extended Type 2 data to the resource based upon their expert knowledge of the material in the subject domain or possibly based upon Type 3 data specific to that subject domain. The owner's intimate relationship with the subject matter also represents better selection of resource and hence higher quality than a general-purpose resource site.
As metadata is being adopted by Resource Site owners, pure core elements will be provided by the resource owner in addition to some extended metadata which represents the interest and characteristics of the Resource Site. The ability to provide additional extended metadata that are especially relevant to Subject Gateways' users becomes the key asset of Subject Gateways. While the pure core metadata (pure Type 2 data), either embedded within the resource or detached, will be provided by the resource owner, the provision of the extended metadata becomes main asset driving users to the Subject Gateway. The assets of the Subject Gateway depends both on the size of the collection of the Type 2 data, the amount of additional extended metadata attached (linked) to each resource , and the rating service to help users to find the relevant resource.
This notion of division of labour (in terms of creating metadata) between the resource creator and the cataloger has also been suggested by James Weinheimer of Princeton University (Weinheimer, 1999).
In ‘A Strategic Framework for the Information Economy’, released by the Australian National Office for the Information Economy in December, 1998 it is stated that
The private sector is driving, and will continue to drive, the transition to the information economy. ... [ in the context of] the role of governments to provide an environment conducive to investment in new technology…
The underlying principle is valid for a national strategy for enabling collaboration and competition of the provision of Subject Gateways and promoting EdNA Online.
One of the difficulties of enabling collaboration among owners of resource databases (Search Sites) is the apparent tension between collaboration and competition. While serving Australian education (and Higher Education in particular) is the common goal, there is a genuine goodwill to collaborate. There is also an element of competition between the Subject Gateway owners. This is particularly true when we are talking about collaboration between funded activities. The very nature that these are funded based on competitive bidding (although the source of funding may all come from DETYA) reduces the collaboration opportunities.
The main value of diversity (i.e., existence of different Subject Gateways) is the provision of best possible search results for "educational" purposes in that particular subject domain. Any behaviour in trying to block competition (educational effectiveness) by locking users into proprietary software, standards and/or practice should be discouraged. The goal of competition is improving the service by allowing specialising in different subject domains. As argued above, the main asset of Subject Gateways is the extended metadata which are based on a sound, proven knowledge model in the subject area. Collaboration is needed to create a national interoperability standard of Type 3 data so that information is categorised in ways that are interoperable to enable megasearching.
The Web is aptly named given that it is an ever-expanding web of inter-connectedness, of networks and nodes, rich in information resources and communications opportunities. Every day a new online initiative is launched and with increasing frequency such initiatives are associated with policies of governments. The ‘information society’ has arrived and is setting the scene for social, economic and cultural re-configuration for the new millennium. The proposed strategy of establishing a framework for collaboration needs to leverage upon both the creative use of information resources and the creative use of the communication opportunities.
As a network, EdNA is much more than a database initiative - it aggregates value-add. Using a commercial metaphor, it is the directory with wide coverage which assists buyers, sellers, and brokers providing information, options and linkages.
As a process of collaboration and co-operation across all educational sectors and systems EdNA is geared toward leveraging and maximising the benefits of IT for education.
As a nationally focused product, EdNA Online acts as a primary point of entry for the online searching of educational resources in Australia. EdNA Online is a website which is established to deliver this while also facilitating collaboration. In its initial years it has focused upon developing resource discovery opportunities which are guided by principles of returning quality resources and minimising duplication of effort. Development of the EdNA metadata standard, launched in August 1998 and based on the international Dublin Core standard, demonstrates this approach. Furthermore, the EdNA metadata standard is in step with the Australian Government Locator Service (AGLS) metadata standard, one that was launched in 1998 as a whole-of-government initiative. In this example, the principle of interoperability is preeminent.
It has often been said that 'the beauty of standards is there's so many to choose from!' This apparent irony is indeed a real challenge, because the so-called 'global information infrastructure' has many facets to it and will indeed require a range of standards in order to consolidate. Thus, there are other significant 'standards' that are impacting on the online education scene. The most significant of these is the IEEE LOM standard of which forms the basis of the Instructional Management Systems (IMS) project. IMS is a US-based initiative and is in its third year of development. In the last twelve months its metadata foundation has become more closely aligned with the Dublin Core standard, although technically speaking it goes much further in the 'granularity' of metadata, in interoperability and complexity. Further, it is focused on the flexible management of online courseware.
The overall stated goal of the IMS project is to 'Enable an Open Architecture for Learning'. IMS stakeholders are identified as: learners, teachers, co-ordinators and providers. Key design considerations for online learning are identified as: granular content; scalable systems; interoperability; customisability and extensibility and facilitation of and support for collaboration. Importantly, the IMS specification involves not just metadata standards but standards that also relate to user profiling and technical standards such as XML and CORBA. However, while it is a significant international initiative largely driven from the USA (and has EDUCAUSE support) it is still very much in the prototype testbedding stage. Recognising its importance, DETYA are facilitating the development of testbed activities and an Australian IMS Centre for information dissemination. From an Australian point of view, participation in this process allows both access to testing the software and an opportunity to influence the development of IMS standards. IMS metadata is currently considerably more complex in conception than EdNA’s, and involves three ‘schemas’: Categories (9 elements) Data Elements (57) Abstract Data Types (17).
As evident from the experience of the library community, all subject domains have their specific needs. While many are using controlled vocabularies such as LCSH, precise semantic standardisation presents a real problem as common terms can have different meanings for different user groups. In other words, domain specific efforts in standardising metadata must remain within the domain specific community, hence the encouragement of extended metadata elements.
While many Subject Gateways are interested in providing services to education and hence may have elements related to 'education' and 'pedagogy', other extended elements will be included with different emphasis from different Subject Gateways and communities. The previous fuzzy notions between "pure" and "extended" metadata have not been very useful in advancing the debate. Based on the data model presented above, the notion of "pure" and "extended" can be used to inform the debate between metadata "extension" and metadata "qualification".
Let's take the notion of "Pedagogy.IMS" or "Pedagogy.EdNA" versus "IMS.Pedagogy" or "EdNA.Pedagogy". The former assumes common global acceptance of the term "pedagogy". By using qualifiers, we acknowledge that different communities (IMS or EdNA) may have slightly different interpretation and hence "qualify" the element using the Type 3 data especially relevant to the community.
If we subscribe to "IMS.Pedagogy", we acknowledge that there is a community (identified as IMS) which has defined a concept of "Pedagogy" to meet its users needs. Equally, "EdNA.Pedagogy" implies that there is a community called EdNA which also chooses to use the word "Pedagogy" for its needs. However, there may or may not be formal or equivalent semantic mapping between the two uses. This has been evident in recent mapping exercises involving IMS metadata and DC metadata elements.
It is not appropriate to take a stand based on technical constraints only. The need for interoperability will imply constraints upon semantics assigned to agreed elements. Thus, the proliferation of communities each adding a slightly different flavour to the semantics will make interoperability extremely difficult.
Any collaborative framework must recognise this apparent contradiction and work creatively to enable a solution. One suggestion may be:
The collaborative framework specifies the standard of expressing domain specific semantics so that cross domain searching (e.g., mega-metasearch) may be done without the search engine having to understand the semantics. In other words, a megasearch engine does not need to understand domain-specific semantics, it only needs to be able to pass queries from users in standard format, depending on the extensive knowledge in domain specific Search Sites to perform the task.
The merging of results, the relative ranking of results and associated problems can be dealt with by allowing the user to judge on the relative relevance of the Search Sites to their specific need. For an Australian physics teacher searching for information on "nuclear energy", a resource with EdNA extended metadata may be deemed to be more relevant than a resource returned from a special "High Energy Physics" Subject Gateway, but it is not necessarily the case for a research scientist working in the field.
The theoretical Data Model presented in this paper may serve to give a greater understanding of the resource discovery process used by Subject Gateways. The assignment of standard metadata represents a vital link in this process. However the paper argues that metadata itself is made up of pure and extended elements. Pure elements describe the resource itself and are based on commonly accepted standards such Dublin Core. These are most appropriately created by the resource owner. Extended metadata on the other hand is part of the particular value-adding process of the Subject Gateway and is designed to cater for the specific needs of the users. This extended metadata and the way it is organised is a primary asset of the Gateway.
However as Gateways are set up to optimise searching for the members of their particular communities, there are basic semantic and syntactic differences in their use of extended metadata. These present a major impediment to the development of a collaborative framework which is needed to capture the synergy of these efforts. Any collaborative framework which compromises the ownership of their assets is not likely to be accepted by the owners of these databases. To remain competitive in a rapidly expanding global market, Australian Higher Education cannot afford to stand at the side and wait for the development of standards which will influence how education will be delivered and how effective the digital education system will be.
The concept of mega-metasearching thus provides one possible solution to this dilemma. The proposed search engine would be able to search across the pure metadata using the standards accepted by the subscriber Gateways. However for more advanced searching utilising the full richness of the Gateway’s subject expertise, the enquiry would be directed to the gateway.
Thus an intelligent megasearch site would be able to interpret the query (which might be phrased in natural language terms), search its own database for suitable resources and then refer the query to the most appropriate Subject Gateway. It would then present these results using a ranking that took account of the search criteria allowing the searcher to make informed decisions.
A two-tier mega-metasearching strategy is thus proposed in this paper which would preserve the knowledge specific specialties of different subject gateways. EdNA's collaborative framework and its main online service, EdNA Online, provide Australian organisations with a unique opportunity to embrace the mega-metasearching concept.
We have proposed a framework which may enable the creation of the collaborative framework. It is not pre-emptive and only serves to illustrate that a solution does exist to meet the challenge of apparent dichotomy of collaboration and competition. An open, continual dialog between all interested parties will provide an innovative solution to meet the challenges ahead.
Dunn, A. (1999) ‘Net Searchers to Index All 800 Million Pages’, Los Angeles Times, 3 Aug. 99, http://www.latimes.com/CNS_DAYS/990803/T000068951.html (Accessed 6 Aug. 99)
IBM (1998) The DESIRE project, http://www.ukoln.ac.uk/metadata/desire/overview/rev_ti.htm (Accessed 15 Aug. 99)
Kleinberg, J. (1997) Authoritative Sources in a Hyperlinked Environment, http://www.cs.cornell.edu/home/kleinber/auth.ps (Accessed 20 Aug. 99)
Lynch, C. (1997) ‘Searching the Internet’, Scientific American, March, 1997, also online http://www.sciam.com/0397issue/0397lynch.html (Accessed 20 Aug. 99)
Milstesad, J. and Feldman, S. (1998) ‘Metadata: Cataloging by Any other Name’, Online (2)99, http://www.onlineinc.com/onlinemag/OL1999/milstead1.html (Accessed 20 Aug. 99)
Nielsen, Jakob (1998) ‘Why Yahoo is Good(But May Get Worse)’. Alertbox, 1 Nov. 98 http://www.useit.com/alertbox/981101.html (Accessed 20 Aug. 99)
NOIE, (1998) A Strategic Framework for the Information Economy - Identifying Priorities for Action. National Office for the Information Economy, Commonwealth of Australia, Canberra.
Sullivan, Danny (1999) ‘How Search Engines Work’, Search Engine Watch, http://www.searchenginewatch.com/webmasters/work.html (Accessed 20 Aug. 99)
Thomas, C.F. & Griffin, L.S. (1999) Who Will Create Metadata For the Internet. http://www.firstmonday.dk/issues/issue3_12/thomas/index.html (Accessed 20 Aug. 99)
W3C (1999) Dublin Core Metadata Initiative, http://purl.oclc.org/dc/ (Accessed 20 Aug. 99)
Weinheimer, J. (1999), Proposal for Metadata System http://www.princeton.edu/~jamesw/mdata/metadataprop.html (Accessed 20 Aug. 99)
Websites:
All-In-One Seach Page http://www.albany.net/allinone
Dogpile http://www.dogpile.com
EdNA http://www.edna.edu.au
FindIt http://www.itools.com/find-it
HotBot http://www.wired.com/home/digital/
Inference Find http://www.infind.com
Mamma http://www.mamma.com
MetaCrawler http://www.metacrawler.com
MetaFind http://www.metafind.com
OneSeek http://www.oneseek.com
SavvySearch http://www.savvysearch.com
![]() |
|