Automatic Classification

Classification in Wiki via automatic schemes Thinking Out Loud.Donald Noyes.200908252059


In a wiki system, the grouping of concepts and subjects into a document and then giving it a Wiki Word name, making its name a Document Representative, of great use in the Mapping of Concepts. This representation is the first level of grouping wikis use to organize concepts, subjects, concerns, or problems. Because Wiki Words are are rarely used in general discourse to identify the document for which it stands, the traditional rigorous techniques for classification will usually ignore them as insignificant. The distribution curve of words which allow Wiki Words, will place them on the lower portion of a hyperbolic curve. Some of the automatic classification schemes simply exclude this area.

The referenced Figure demonstrates a hyperbolic curve and how high and low frequency words are filtered out:

Wiki Words are found in the upper portion (less frequently used)


Wiki Stems, Words, Phrases (Spacified Wiki Words)

Some relevance to the inclusion of such things as word stems and phrases in a classification scheme have beeen recognized:

"There is no reason why such an analysis should be restricted to just words."

"It could equally well be applied to stems of words (or phrases) and in fact this has often been done."


There is a sense in which the use of Wiki Words can enhance the location efficiency of searches made with the popular search engines, in that when used as a search term, will return tens to hundreds of results, where using a single word will result in hundred thousands to several million results.

Using Wiki Words can be like using a search engine as a powerful magnet to locate the "needle in the haystack".


Other ways the Wiki Way can be used to enhance Automatic Classification:

on the average the simplest indexing procedures which identify a given document or query by a set of terms, weighted or unweighted, obtained from document or query text are also the most effective'. Its recommendations are clear, automatic text analysis should use weighted terms derived from document excerpts whose length is at least that of a document abstract

Clusterings

The representation of the single link hierarchy through an MST has proved very useful in connecting single link with other clustering techniques ...Implication of classification methods It is fairly difficult to talk about the implementation of anautomatic classification method without at the same time referring tothe file

Categorizations

"The basic relationship underlying the automatic construction of keyword classes is as follows If keyword a and b are substitutible for one another in the sense that we are prepared to accept a document containing one in response to a request containing the other this will be because they have the same meaning or refer to a common subject or topic"


It might be a great idea, but I suggest that it not create wiki-words automatically. It's best to first be vetted by people.


See original on c2.com