not a beautiful or unique snowflake (nothings) wrote,
not a beautiful or unique snowflake
nothings

I was reading a post in brad's journal about somebody trying to get people to invent new pangrams (sentences that contain every letter of the alphabet, as in the quick brown fox jumps over the lazy dog), and his thoughts on writing a program to attempt to generate "perfect" pangrams (26-letter ones). I posted a suggestion for an alternate implementation, and in that way that things go, I ended up implementing it myself.


Crucial to a task like this is using a good dictionary. I knew from the get go that the /usr/dict/words I had sitting around would be sub-ideal. But for my first test I used it anyway. One crucial change is I removed all the one-letter words except 'a' and 'I' (it's more designed for spell-checking, and had all the isolated letters). I also through away any word which had non-alphabetic characters, which is obviously necessary for "3rd" but isn't the right thing for "isn't". But oh well.

I ran it trying to find sentences of at most 14 words ('a', 'i', and 12 more two-letter words will use up all the characters). It took 15 minutes and produced a 2G data file, which I am currently running wc on to find out how many pangrams it has. Here are some of them:

peck gmt quash blvd jinx frowzy (6 words)
jock freshman blvd twx gyp quiz
kim nbc fjord twx glyph vasquez
gm phd bstj quick nv alex frowzy
gm planck ford bstj vec why quiz (7 words)
fmc rang skopje blvd tx why quiz
jock phd nbs graft wv xylem quiz
bag fjord seq luck twx nymph viz
qed slack jug wv box nymph fritz -> qed slack jug vw box nymph fritz
cf gm jar blvd twx physik quezon
(I altered one a little to show how it might be tolerable.)

So, you get the idea. So, I stripped out all words from the dictionary that had capitalized letters, since all of the abbreviations were captialized. Then I added back in 1000 male and 2000 female first names from the last census, since it had lost ALL proper names. Then I added 's' and 'ed', because the dictionary doesn't include most plurals and tenses, and they can often be manually added on.

So I ran it on this new, "improved" dictionary, and got absolutely 0 perfect pangrams. Nada, zero, zilch. To check to make sure that there wasn't a bug, I added some special all-difficult-consonant words to the dictionary, and then it merrily produce a 275M file with 5 million "pangrams" in it, like "jockstrap fumbly dinghy vwxzq" and

sped might buck frown xjqv lazy
speck might bud frown xjqv lazy
deck sight bump frown xjqv lazy
skimp debt chug frown xjqv lazy
peck sight dumb frown xjqv lazy
pig sketch dumb frown xjqv lazy
beck sight dump frown xjqv lazy
big sketch dump frown xjqv lazy

which gives you some sense of how repetitive these are and why these files are so huge.

So, utterly useless, either too much or too little. Oh yes, 36 million "pangrams" in the original run.

Should you want the source code, feel free.
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 2 comments