Wiki Word Statistics

A search for each of these words against the search database available on 7 March 2000 gave these results

"pattern":
1376 hits/ 378 page titles (5.0%)

"Wiki":
1408 hits/ 279 page titles (3.7%)

"XP":
856 hits/200 page titles (2.7%)

"Extreme Programming:
10 hits/ 83 page titles (1.1%)

The database contained 7546 pages.

It's interesting to compare those last two separately, rather than combine them.

Please append to, rather than modifying these figures, so that we can compare against them at some later date. My guess would be that in, say, a years time, the XP pages will be a lower proportion of the total, since the Wiki Mind will have drifted elsewhere. --Keith Braithwaite

May 12th, 2001

"pattern":
2130 hits/ 479 page titles (3.13%)

"Wiki":
?? hits/727 page titles (4.76%)

"XP":
?? hits/406 page titles (2.66%)

"Extreme Programming":
?? hits/145 page titles (0.94% of total)

The database contained 15,289 pages. Searching for word hits didn't work too well, as the size of the resulting pages caused network dropouts.

The database grew by 102%. Extreme Programming grew by 75%. XP grew by 104%, Wiki grew by 161%, and patterns by 27%.


It is a part of usual Language Oriented Programming practice to look at the words that are used in a system. I had postponed this for a while (being new to the Wiki), but now I did.

The list at the end of this page is the first part of the output of processing wikiList.

If you do this on a software API you usually find something interesting. Special words, redundant words, wrong words ... but at first sight, I didn't find anything of significance.

Of course, if you look at the first few lines (strip some simple words) you find what this Wiki is about: Wiki Programming Patterns Extreme.

But when I read on, I felt like a shaman priest having thrown a bag of bones to read the present and the future:

Is * Xp * In * Pattern * Software ?

Software * For * Java * Language

Design * Object * Test

Do * Group * Project

Why * Your * Model ?

First * Good * Process

Refactor * All * Com * Data

Just * More * Net * Server

Meta * Source * Challenge

Challenge * Programmer * Books

Open * Pages * Change * Engineering

Can * Music * Need * Public ?

Thousand * Uml * Based * Exception

Quality * Science * Talk * Who ?

Rule * Rules

Abstract * Before * Common * Interfaces

We * Writing * Browser * Classes

Document * Factory * Implementation

Plan * Programmers * Reuse

Components * Considered * Dead

Its * Joe !

Know * Leadership

Null * Person * Reading * Requirements

Clear * Culture

Human * Ideal * Inheritance

Where * Zen * Applications

Evolving * Experiment * External * Failure

Fast * Forth * Groups

Linux * Load * Never

to name just a few. Just try it! Perhaps some expert can read and interpret this. I'm unable to. -- Helmut Leitner

If you play a little loose, the very first few words sum up Wiki pretty well:

The * Wiki * Of * Programming * And * Patterns


See also Wiki Mines.


On a similar note, I am trying to divise a way to determine the "centres" of a wiki (or things similar to wikis). My best attempt so far has been usemod.com . -- Sunir Shah


As one might expect, the number of occurrences of a given Wiki Word per Wiki Page obeys a Power Law. This hypothesis was tested in March 2003 with 724 pages containing the Wiki Word "UnitTest". A Log Log plot of the count of the pages with a given number of occurrences of "UnitTest" was created. The values are linear for the first two orders of magnitude, though they diverge from the ideal value as the number of occurrences of "UnitTest" per page increases:

www.thorgolucky.com

Linear regression yields r-squared = 0.936.

A second test with "ExtremeProgramming" and 1,189 Back Linked pages gave a similar result, with r-squared = 0.950:

www.thorgolucky.com

Binning the data increases r-squared to 0.99+.


The original version of this list counted each entry twice. This has been corrected.

Files: 1 Found: 80843

Count Statistic:

The

Wiki

Of

And

Programming

Patterns

To

Extreme

Is

Xp

In

Pattern

Software

For

Java

Language

Design

Object

Test

Code

Page

On

Web

What

As

Category

With

John

Are

Not

Smalltalk

Unit

It

Discussion

You

Two

Refactoring

One

Use

Mc

Do

Group

Project

David

How

New

From

By

About

This

Component

Topic

Work

At

Testing

Vs

Be

Ejb

Objects

System

Systems

Development

Dont

Meeting

User

Visual

Name

Why

Your

Model

Tcpg

Management

Michael

Time

First

Good

Process

Class

People

Method

Refactor

All

Com

Data

Free

Just

More

Net

Server

That

Three

Architecture

Link

Mark

Problem

Value

Big

Book

Interface

Changes

No

Peter

An

Bill

Cpp

Jim

Meta

Source

Challenge

Programmer

Books

Case

Dot

Exceptions

List

Open

Pages

Change

Engineering

Mike

My

Robert

Computer

Dave

Plus

Principle

Game

Links

Microsoft

Oriented

Pair

Tom

De

Eric

Go

Methodology

Story

James

Knowledge

Mode

Richard

Steve

Thing

Way

Bob

Me

Mind

Space

Up

World

Art

Business

Chris

Example

Form

Function

Law

Real

Stories

Technology

Vb

Ytwok

Information

Martin

Nine

Or

Paul

Python

Things

Tim

Too

Alan

Anti

Ats

Community

Framework

History

Recent

State

Team

Tests

Text

Thomas

When

Write

Analysis

Bad

Delete

Great

Isa

Life

Make

Metaphor

Perl

Thread

Twenty

Words

Works

Basic

Beans

Box

Can

Music

Need

Public

Thousand

Uml

Based

Exception

Home

Idea

Languages

Quality

Science

Talk

Who

Word

Brian

Coding

Does

Four

Functional

Green

Jeff

Once

Review

Rule

Rules

Self

Should

Smith

Stone

Users

Abstract

Before

Common

Interfaces

Like

Non

Oo

Out

Scott

Script

Seven

Together

Tool

We

Writing

Browser

Classes

Document

Factory

Implementation

Little

Ninety

Plan

Programmers

Reuse

Right

Solution

View

Anonymous

Bug

Comments

Components

Considered

Dead

Distributed

Hard

Its

Joe

Know

Leadership

Mac

Machine

Multi

Order

Other

Post

Problems

Program

Question

Questions

Style

Types

Visitors

Andrew

Bean

Card

Content

Could

Dan

Database

Documentation

Edit

Enterprise

Faq

Fic

Frank

Games

Gof

Greg

Grok

Int

Love

Man

Message

Only

Over

Paper

Please

Point

Power

Reviews

Side

Simple

Six

Soft

Solutions

Stephen

Success

Think

Tools

Unix

Using

Ward

Will

Agent

Application

Bruce

Computing

Daniel

Definition

Effect

Entity

Flow

Immersion

Kent

Kevin

Line

Methods

Null

Person

Reading

Requirements

Roger

Ron

Search

Star

Thinking

Tips

Well

Workshop

Another

Cant

Cards

Cee

Clear

Culture

Developer

Domain

Don

Doug

Editing

End

Evil

Examples

Full

Future

Get

Harmful

Has

Have

Here

Junit

Lazy

Learning

Library

Map

Modeling

Old

Oopsla

Planning

Plop

Principles

Pro

Resource

Second

Simplest

So

Task

Type

Van

Wall

Win

Active

Analogy

Back

Bell

Best

Binary

But

Client

Control

Corporation

Cplus

Editor

Emacs

Five

Fix

George

God

Human

Ideal

Inheritance

Long

Most

News

Quote

Reference

Research

Sand

Session

Single

Society

Stuff

Theory

Tri

Variables

William

Writers

Age

Better

Between

Blue

Bugs

Builder

Charles

Command

Complex

Context

Continuous

Cool

Copy

Cost

Death

Driven

Ed

Edward

Factor

File

Frameworks

Guide

He

Hot

Hyper

Integration

Keith

Ken

Keyboard

Lisp

Memory

Multiple

Names

Nature

Org

Play

Plug

Processing

Small

Spaces

Standard

Structure

There

Trial

University

Values

Ware

Where

Zen

Applications

Architect

Architectural

Around

Author

Black

Blocks

Build

Call

Composite

Crc

Cultural

Douglas

Down

Environment

Evolutionary

Evolving

Experiment

External

Failure

Fast

Forth

Groups

Ian

Institute

Inter

Issues

Jean

Larry

Linux

Load

Never

Nick

Os

Own

Possibly

Practice

Product

Projects

Proof

Quotes

Ralph

Read

Really

Replace

Risk

Rob

Role

Room

Sam

Servlet

Short

Silicon

Study

Thirty

Threads

Tree

Very

Visitor

Without

Xml



See original on c2.com