《那些古怪又讓人憂心的問題》第99期:論Twitter的無窮性(1)

推薦人：來源: 閱讀: 2.88W 次大中小

TWITTER

論Twitter的無窮性

Q. How many unique English tweets are possible? How long would it take for the population of the world to read them all out loud?

Q．世界上有多少獨一無二的英語推文（Twitter狀態）？如果全世界人民把它們都讀出來要花多少時間？

——Eric H, Hopatcong, NJ

——埃裏克•H

High up in the North in the land called Svithjod, there stands a rock. It is a hundred miles high and a hundred miles wide. Once every thousand years a little bird comes to this rock to sharpen its beak. When the rock has thus been worn away, then a single day of eternity will have gone by.

在遙遠的北方有一個叫斯維斯約德（Svithjod）的地方，那裏有一塊大石頭，它有100英里長，100英里高。每一千年都有一隻小鳥來到這塊巨石前，用石頭磨礪自己的喙。當石頭就這樣被磨掉之後，永恆終才過了一天。

——Hendrik Willem Van Loon

——亨德里克•W．房龍

A. TWEETS ARE 140 CHARACTERS long. There are 26 letters in English-27 if you include spaces. Using that alphabet, there are 27140 ≈ 10200 possible strings.

A．推文只能有140個字符。而英語中有26個字母——如果你把空格也算進去的話是27個。如果利用這些字母，那麼就有27140≈10200種可能的字符串。

But Twitter doesn't limit you to those characters. You have all of Unicode to play with, which has room for over a million different characters. The way Twitter counts Unicode characters is complicated, but the number of possible strings could be as high as 10800.

但是在推文中你不止可以使用這些字符，所有的Unicode字符你都可以使用，而這加起來有超過100萬個不同的字符。Twitter裏Unicode字符算多少字的算法很複雜，但可能的字符串個數仍然高達10800種。

Of course, almost all of them would be meaningless jumbles of characters from a dozen different languages. Even if you're limited to the 26 English letters, the strings would be full of meaningless jumbles like “ptikobj.” Eric's question was about tweets that actually say something in English. How many of those are possible?

當然了，這些字符串中大多數都是毫無意義的多語種混搭，即使你把可使用的字符限定在26個英語字母中，也是充斥着像“ptikobj”這樣無意義的詞。但埃裏克提的問題是用英語表達一些有意義的內容，那麼有多少種可能性呢？

This is a tough question. Your first impulse might be to allow only English words. Then you could further restrict it to grammatically valid sentences.

這個問題有點棘手。你的第一直覺大概是隻允許使用英語裏有的詞。接下來你可能想把範圍限制在合乎語法的句子裏。

But it gets tricky. For example, “Hi, I'm Mxyztplk” is a grammatically valid sentence if your name happens to be Mxyztplk. (Come to think of it, it's just as grammatically valid if you're lying.) Clearly, it doesn't make sense to count every string that starts with “Hi, I'm . . . ” as a separate sentence. To a normal English speaker, “Hi, I'm Mxyztplk” is basically indistinguishable from “Hi, I'm Mxzkqklt,” and shouldn't both count. But “Hi, I'm xPoKeFaNx” is definitely recognizably different from the first two, even though “xPoKeFaNx” isn't an English word by any stretch of the imagination.

但這裏有陷阱。比如說，如果你的名字正好是Mxyztplk的話，“Hi，I'm Mxyztplk”這句話在語法上就沒問題。（說起來，就算你撒謊了，你的名字不是這個，這句話在語法上依然成立呀。）所以一個顯然的問題就是，你不能把所有以“Hi，I'm…”開頭的字符串當作一個獨立的句子。對於一個普通的說英語的人來說，“Hi，I'm Mxyztplk”和“Hi，I'm Mxzkqklt”簡直沒有任何區別，因而它們不能被重複計數。但是“Hi，I'm xPoKeFaNx”這句話與之前那兩句話是一眼就能看出不同的，哪怕“xPoKeFaNx”也無論如何不可能是一個英語單詞。

Our way of measuring distinctiveness seems to be falling apart. Fortunately, there's a better approach.

所以我們用來衡量差異性的辦法不管用了。所幸還有更好的辦法。

Let's imagine a language that has only two valid sentences, and every tweet must be one of the two sentences. They are:

假設存在一種語言，它只有兩個可用的句子，並且每條推文必須是這兩個句子中的一句。這兩個句子分別是：

“There's a horse in aisle five.”

•“5號通道有一匹馬。”

“My house is full of traps.”

•“我的屋子裏都是陷阱。”

當前位置

《那些古怪又讓人憂心的問題》第99期:論Twitter的無窮性(1)

相關文章

欄目導航

熱點閱讀

推薦閱讀

猜你喜歡