not a beautiful or unique snowflake (nothings) wrote,
not a beautiful or unique snowflake
nothings

pangrams part 2

So after attempting to find perfect pangrams, and finding none using a perfect dictionary, on discovering that the pangram contest got extended by a few days, I set out to write a program to find imperfect pangrams. This was much harder, and much slower.

Again the strategy is to use dynamic programming. In the original program, the dynamic programaming matrix consisted of rows of bits, one for every letter combination (2^26 bits), and one row per word in the entry. For this round, each row had the same contents, but there was one row for each letter in the pangram-so-far. So "big jerk" would have an entry for 'begijkr' in the 7th row. (Perhaps the word-based strategy would still have worked. I forget now why I switched. Maybe just to see how it would go.)

The program proceeds by traversing a given row, and adding all the words to what it finds, marking the appropriate spots higher up in the matrix. All word additions are legal--it's ok to add a word that contains a duplicated letter to what's already there.

Engineering trade-offs: only one word is allowed to contain a 'z'; this is used to reduce the number of table entries to 2^25, much like the previous program. Words are considered "anagrams" if they contain the same letters and are the same length, so 'reaper' and 'appear' are equivalent. Although all word additions are legal, I have a max pangram length (typically ~30), and if adding a given word leaves more letters unused then are remaining in the pangram, it's rejected. (This is actually checked at scan time for performance reasons, I think.)

In the original program, once you find a match, it's fairly easy to retraverse the table to find how to get there; you run through all the possible words, subtract out the letters from each word from the set of letters you're trying to make, and check if the appropriate entry in the matrix is marked as valid; if so, you go that way. In this program, you can't subtract out letters, because they might be duplicates. In other words, we know we're using 'abcdefghijklmnopqrstuvwxyz', and we think one of the words used was 'golf'. That doesn't mean that we added the word 'golf' to a letter set that didn't contain any of the letters 'fglo'; since repeated letters are allowed, the 'g' might have already been present, or the 'o', or the 'g' and the 'l', or etc. So for each possible word, I build a list of all the bitmasks for all the subsets of letters in that word, and I test for any of those being present.

This program did take several hours to run. Here's some raw sample output from an early version of the program:

fjord s quack vamp hex by waltzing
hemp fjord s quack vex by waltzing
hemp fjord s quick vex by waltzing
fjord s quick vamp hex by waltzing
jab quip thwack flax gym rendezvous
jab quip thwack flux gym rendezvous
jab qualm thwack fix gyp rendezvous
jab quick twig fix lymph rendezvous
jab quack twig fix lymph rendezvous
jab gift quack wax lymph rendezvous
jab gift quick wax lymph rendezvous
jab quip thwack flex gym rendezvous
tomb quick jaw fix glyph rendezvous
jack quit womb fix glyph rendezvous
tomb quack jaw fix glyph rendezvous
jot quack womb fix glyph rendezvous
tomb quick jaw fox glyph rendezvous
jack quit womb fox glyph rendezvous
job quip thwack flax gym rendezvous
job quip thwack flux gym rendezvous
job qualm thwack fix gyp rendezvous
gift quack jaw box lymph rendezvous
gift quick jaw box lymph rendezvous
job quack twig fix lymph rendezvous
job quack twig fox lymph rendezvous
jab quick twig fox lymph rendezvous
jab quack twig fox lymph rendezvous
job gift quack wax lymph rendezvous
job gift quick wax lymph rendezvous
jet quack womb fix glyph rendezvous
job quip thwack flex gym rendezvous
frog quack jaw vex nymph destabilize
frog quick vamp jinx why destabilize
frog quack vamp jinx why destabilize
frog quick jaw vex nymph destabilize
jab finch squawk vex gym trapezoidal
finch job squawk vex gym trapezoidal
jung verb squawk pax fly dichotomize
junk verb squaw flax gyp dichotomize
jab vern squawk flax gyp dichotomize


Here's one that prints out known anagrams in parentheses. Sadly, some words ended up in the dictionary more than once, because they were both words and first names (jack, buck, max). This is some random excerpts:

big fjord s wick vex nymph quetzal
jock limp s vend whig fox by quartz
jig clomp vend whisk fox by quartz
jim spend blight quack vex frowzy
jock bop s qualm dwight vex frenzy
jock plight dumb squaw vex frenzy
fleck jog s vend whop mix by quartz
mig jock s vend few phlox by quartz
jed comb sink (skin) vow fix glyph quartz
gam blink fjord quip vex cyst whiz
glom bank fjord quip vex cyst whiz
gam blank fjord quip vex cyst whiz
job smart quick vend flux gyp whiz
jab storm quick vend flux gyp whiz
jab chord punt squawk flex gym viz
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 0 comments