Home
Do you think that artificial intelligence will ever reach a level where Wikipedia could write itself?

For that to be possible, all information that would ever be useful as source material for Wikipedia would need to be digitized and available on the Internet. That seems inevitable. I think we can agree that all of the source material will someday be on the Internet.

I can imagine a future law, at least in the U.S., that makes all published information available to Wikipedia's search engines for free, so long as only short bits are quoted and cited. So while you and I might have access only to public information and to books we own or borrow, Wikipedia's search engines would have full access to all works.

Wikipedia could partner with Google to search the Internet for new topics and new information on existing topics. That part is easy. But what sources would it trust? I can imagine a day when all sources of information on the Internet have some sort of reliability rating. For example, the Wall Street Journal would have a high reliability rating and this blog would have zero.

The hard part for artificial intelligence is editing and summarizing content in a form that humans can easily digest. No one has yet designed software that can write well. But I think that's coming. Writing is entirely rule based. Teaching a computer to write might be ten times harder than teaching it to play chess, or maybe a thousand times harder, but it's only a matter of time. Learning to write is mostly pattern recognition.

Somehow Wikipedia's artificial intelligence would also need to judge what is important enough to include in its summary. Could software, for example, figure out how to describe the American Revolution on a page or two? I think it could, simply by comparing all of the source material on the topic and sorting it by the keywords that are mentioned most often.

Once Wikipedia becomes untethered from its human editors it will grow at a much faster rate, and perhaps include knowledge on a deeper technical level, including patents, law, medicine, and so on.

I don't think Wikipedia will ever be self-aware, but there's no real limit on how awesome it can be.

Update:

Reader Dan sends me this relevant link.

See Wikipedia article on world’s most prolific author (via automatic data mining and autonomous computer authoring).

http://en.wikipedia.org/wiki/Philip_M._Parker

http://www.icongrouponline.com/

 


 
Rank Up Rank Down Votes:  +25
  • Print
  • Share
  • Share:

Comments

Sort By:
Sep 9, 2010
As I understand it, Google picked up pattern recognition which is the basis of Google Translate. Instead of using humans to create the translation databases, they use a Rosetta stone method in which they compare web documents of the same information in different languages, and use pattern recognition to identify not only word translations, but phrases as well. Not only does this work, but it deals with changing/growing languages and regional variations.
 
 
Sep 9, 2010
If this is going to happen, send me the software so it can do my homework.
 
 
Sep 9, 2010
AI is pretty pathetic right now. I went through the code for some turing "winners" (admittedly the most recent time was 5 years ago), and found that they were using the same pattern matching / vague response mechanisms that were implemented two decades earlier. The winning program didn't "understand" anything -- for instance "ball" might be in the dictionary, but the purpose, shape, and other qualities were not important to the program. The only thing meaningful about it was that it could retrieve a question like "Do you like sports?" in response to an input with the word "Ball" in it. It could track that Sports was the subject of the next question if it had pronouns. It could follow up with "what do you (or don't you) like about sports?" based on a positive or negative answer. The program also had a story it wanted to tell, if your inputs were more questioning. It was all smoke-and-mirrors.

I have tried to imagine what it would take to make a real "understanding" program, and the thought experiment devolved into a vast classification system applied to a vast dictionary, and you would have to convert every classification into mathematical values. It seemed difficult for relatively numerical things -- i.e. running represents a change in spatial coordinates with some approximation of timing. It seemed preposterous for more abstract words. Even if you could do that, I'm probably grossly oversimplifying the problem, because words (especially pronouns) have context, and that is sometimes hard for intelligent humans to de-reference. How would a machine discard possible meanings and identify the intended meaning?

I'm fascinated by the work in artificial intelligence, but I don't hear about people making inroads into what I consider a real solution. It's always a rehashing of old ways to cheat.
 
 
+1 Rank Up Rank Down
Sep 9, 2010
On Philip Parker's wikipedia entry it says: "He plans to extend the programs to produce romance novels."

I've been telling my wife for years that she reads formulaic drivel, now someone is actually going to produce the formula. I'd gloat, but she'd just give me a blank stare.
 
 
+1 Rank Up Rank Down
Sep 9, 2010
You throw in bad logic like, 'Writing is entirely rule-based...' then drawing the conclusion that it is therefore possible to create good writing via software since it is also rule-based.

But the artistic element will never be reproducible by computer, except as a copy of someone else's style. The computer could write something in the style of Hemingway, for example, but never create anything interesting or even sensible.

Could a computer create a Dilbert cartoon for you? After all, you've told us before that comics must adhere to certain rules to be funny to a large group of people. Somehow I can't imagine any software that could come up with the subtle 'Yep, if we had duct tape,' line in your last cartoon. Sounds like the 'thousand monkeys with typewriters coming up with Shakespeare' hypothesis.

[Wikipedia isn't art. I don't have a prediction of whether software will someday be able to write good humor or good poems. It might be possible though, because art is little more than concealing the rules of construction from the audience. -- Scott]
 
 
Sep 9, 2010
I'm not sure I agree with your focus on Wikipedia but I think you've probably got the gist of what will happen. Google also seems to be working on the art (skill? disease?) of writing with their new Scribe (http://scribe.googlelabs.com/), which I think is a great example of your "writing as pattern matching" idea.

As for whether Wikipedia could ever write itself, I think the tough part would be winnowing out the screeds of garbage. There are a lot more UFO videos on YouTube than there are skeptical rejoinders and I'm not sure I could take it if the world's foremost knowledge store believed Kennedy was killed by the Nazis using a laser fired from their base on the moon so that he wouldn't reveal that aliens are really humanoid time travellers from the future.

 
 
0 Rank Up Rank Down
Sep 9, 2010
What you are talking about is called the "semantic web". In other words, annotating content so that it can be understood by both humans and computers. It has been a research topic for a while, and though progress is being made (along with standards etc), it still has a long way to go.

There's already a network of such databases. One of them is an effort to annotate wikipedia in such a way.

One of the ways to do that is to store knowledge in subject-predicate-object triplets, e.g. [Texas][is part of][the USA], [Scott Adams][draws][Dilbert], [Dilbert][is a][cartoon character]... if you have a lot of data stored and organized in this way, there's tons of stuff you could do with it. However, don't expect companies to spend a lot of money exposing any data this way - there's not much of a financial incentive to do so. It will be a collaborative project like wikipedia.
 
 
Sep 9, 2010
Yeah, wikipedia is the best thing on the planet...but you still can't cook on it!

 
 
Sep 9, 2010
An artificially intelligent Wikipedia that writes itself will have any no takers as by that time humans will not have any need to obtain knowledge. Artifically intelligent machines would already have been invented to take care of everything..you might say that these machines could use the wikipedia knowledge...but since memory will no longer be a constraint, all these machines will simply acquire all available knowledge and will not need any wikipedia summaries...
 
 
Sep 8, 2010
As someone who has programmed artificial intelligences before, I can tell you we really haven't programmed one to play chess. Basically all computerized chess software starts off with a full simulation of every possible move from the current board, followed by every possible response, and so on. It is referred to as a brute force search. Based on the centuries of work and writings, that search can be optimized somewhat, as we have some general ideas of what constitutes a "better" move, but in the end basically every piece of software just tries every possible move. If you look at a lot of chess software, you'll notice you can set the skill of the computer opponent in minutes (assuming an untimed match). That sets a time limit for searching for the best move. Once that limit has expired, the computer picks the move that it currently thinks is the best.

While writing has basic grammatical rules, we have much less idea what constitutes "good" writing over "bad" writing. We can generally recognize it if we see it, but how do we take a "bad" sentence and make it good? Why pick one word over another? And how can we tell a computer it is "good"? As a writer, you have certainly noticed that rarely do two critics agree. One may love a book while another may hate it, for the exact same reasons. I think good writing is more a matter of perception of by the reader than an inherent quality of the writing itself. It would be more useful for Wikipedia to just store as much information as possible, in whatever form is ideal for its storage ability, and have a component that learns how a specific user likes to see information presented, and dynamically present it to that user in his or her preferred form.
 
 
Sep 8, 2010
I think Wolfram is already working on something like that.
 
 
Sep 8, 2010
AFAIK, there are problems that cannot be algoritmized. Maybe the "creative" writing could be one of them... As translator, I'm confronted with growing use of computer translations, well, it is improving day by day, but still, half of translated sentences are wrong and (worse) one of ten has contrary meaning in comparing with original. Overall, by theory, the translation itself is impossible task, and good translation is more art then labor; in my opinion, the computers would write Wikipedia, but will have to learn how to create art first.
 
 
Sep 8, 2010
Of course it could self-aware. Give it about 25 years and it will be conscious. The singularity is near.
 
 
Sep 8, 2010
Sure, one day the software will be sophisticated enough to write Wiki articles. Why not? You said it yourself. Writing original material is just primarily based on patterns. The perfect application for software.

People complain that Wikipedia articles are just opinions. They say there are no "authorities" providing oversight to guarantee the validity of the entries. But the beauty of Wikipedia is the references. Credible references make credible Wikis. It's up to the human reader to decide if the references are any good.

It's the same thing with say a book on anatomy or the Encyclopedia Britannica. The references are where the credibility is.

One day they'll write software that can read the references for you. It's possible since all human knowledge will eventually be available on the Web. Then we won't need Wikipedia anymore.
 
 
0 Rank Up Rank Down
Sep 8, 2010
On a Mac, if you have a long document, you can use the Summarize service to get a summary of the contents:

http://www.apple.com/pro/tips/summarize.html
 
 
Sep 8, 2010
For me this raises the interesting debate regarding the propagation of incorrect ideas through repetition.

I am currently working to develop a multimedia teaching tool for finance with an elderly professor. He was teaching the subject before I was born. I learned it last year. However because he could not explain the "Reinvestment Assumption" to me clearly and because it did not make sense to me I did some research. It seems the confusion began with a chap called Solomon who wrote a text in 1956. His initial confused idea was corrupted and repeated through generations of finance professor. A 2001 paper by two New Zealand professors (Keef and Roush) clearly explain that the assumption is not valid and does not exist, however their literature revue reveals that up to 71% of the teaching materials in finance support this false claim.

It is fascinating to me that this can happen. Auto-wikipedia would propagate this type of mis-information.

However, obviously we would be no worse off.
 
 
Sep 8, 2010
I can't wait until Wikipedia is able to summarize performing a triple bypass surgery to a single page with pictures. It will seem neat until some moron decides that is looks easy enough to try it on himself. With any luck, his family will also use Wikipedia to research how to sue a company and wind up owning Wikipedia millions....
 
 
+1 Rank Up Rank Down
Sep 8, 2010
Did you see this? Or is this just another coincidence proving we're just a game played by other dimensional beings?

http://gizmodo.com/5632729/this-article-was-fully-composed-on-googles-new-automatic-writing-system

Seriously though, it's a bit off saying it's only pattern recognition so it shouldn't be that hard. My feeling is that the essence of human intelligence and probably even self awareness is rooted in pattern recognition. I think the first true AI to pass Turning Test will be built on a learning system with a heart of pattern recognition.

Dan
 
 
Sep 8, 2010
Actually, students at Northwestern are working on this sort of project. So far they've been able to make a program that writes baseball stories based on box scores.

http://www.npr.org/templates/story/story.php?storyId=122424166

http://infolab.northwestern.edu/projects/stats-monkey/
 
 
Sep 8, 2010
I've thought about this before but from a different angle. My thing also relates to that Newser "editing is the future of the internet" post.

I was reading a first edition copy of a science book which is a bit outdated now. I wondered what if the book was an e-book that the author was allowed to update whenever he liked so I'd always have the most recent opinion on the hypotheses in the book. Then I got thinking about how it could be automated. Imagine the author could tie in sections with the net. So for example if there was a CERN experiment running at the time of publishing, the author might say "... the results of which are pending." but which could later be automatically updated to "to results of which disproved the hypothesis.".

Of course, after this I just went off in a tangent of thought and so I shall here too. What if instead of writing the science book, the author simply linked it to your Wikipedia. For example, imagine he's writing a popular science book on the Higgs-Boson particle. He could simply become the editor who specifies the order and depth of which the information appears. For example, "in chapter 1, I want to briefly introduce to current standard model. After that, I want to introduce competing models" etc.

We were born in the wrong century :(
 
 
 
Get the new Dilbert app!
Old Dilbert Blog