I spent some time this evening improving Wikipedia by suggesting that some of its less well written articles be removed. The task was relatively simple; hit the "random page" button, read the article, decide if it is so horrible as to be unencyclopedic, and if it is then paste in the "please delete this" template and hope for the best.
Combing through a series of random pages on Wikipedia is more grim than I had expected. Many of the articles were very short, to the typical tune of "X is a village with 42 inhabitants" or "Y was an indifferent soccer player on a minor league team". A surprising number of articles were years old and read "Z will be a television series on a minor television network" with no one bothering to follow up to see whether the series was kept or cancelled.
I haven't always been a deletionist. There was a time when you'd find me in Wikipedia more often than not improving some minor but notable page to make it incrementally better in the hopes that it would not be deleted, or arguing to keep some minimally important article from being deleted. The experience of working through random pages was sufficiently enlightening about the general quality of the work there that it seemed more productive to argue instead to wipe out some of the less well considered contributions.
My weblog is similarly full of indifferent contributions, posts which seem to have been a good idea at the time but which don't have much to argue for themselves now. For a long time I had a habit of writing very short and poorly edited posts just in order to get something out quickly, and I'd love to revisit those and improve them by taking them offline.
A help to solid improvement of a large body of work is some way to randomly index your way through the system and work on the next random piece that comes up in front of you. The standard Typepad system does not have any sort of "random post" structure, so I'll need to somehow improvise. What I think I can do is export the entire contents of my blog, write a script to pull out just the titles and URLs of the posts, then generate something that pulls a random line from that list. If I can do that then I can go back randomly and be ruthless in pruning out things that don't belong any more.
Ideally there's some basis for the notion that rather than writing new words all the time, I should be looking at ways of improving some of the old ones instead. That germ of a good idea, half explored in 150 words 5 years ago, might make a good 750 word essay now. That poorly executed quick post of 7 years ago might be deleted to make room for a better issue now. No one will miss the old words, and I might be able to free up space in my memory for something more positive.
The deletionist point of view, when applied too aggressively, can also be pretty grim. The grim reaper decides that one person's hard work is unworthy of inclusion and attacks everything they do until that contributor goes away. If you get hit by a deletionist of that type, your only best approach is to move your contributions to some other medium where you can contribute without attracting undue attention.
My randomized approach to editorial review is much more like the "chaos monkey" used by some cloud-based online services to ensure that their carefully constructed and complicated system is resilient in the case of failure. The chaos monkey takes down parts of a redundant system at random, forcing it to respond in the way that systems designers had planned for an outage. Because you are exercising the failures in a controlled way you can be prepared for uncontrolled failure at some future time.
I was able to download the entire blog corpus, and to write the little script that pulled out a random title from the total. It's now also time to start doing other textual analysis on this word-hoard, looking for words that are used unusually frequently, or subjects that come up more often than I had imagined. With the right tool set there should be lots of good meta-information that I can reuse and turn into new ideas. After all, there's more than a decade worth of stuff buried in almost 10 megabytes of text, and I'm sure that at least 10% of it is deletable without the slightest loss to anyone.
