Thursday, May 28, 2009

Anagrams Part 1 - Brute Force

Beginner programmers are often surprised when a supposedly simple program takes a lot longer to execute than they expect. After all if a modern PC can simulate an entire battle consisting of hundreds of soldiers in real-time 3D, surely it can display all anagrams of a phrase without pausing to think for hours? Consider the problem of finding all anagrams of a word that are the same length as the word. For example, the word "integral" has 8 characters and has the following 8 character anagrams:

alerting
altering
integral
relating
triangle

For completeness we include the original string in the list of anagrams. I used a word list of around 38,000 common English words to get the anagrams. So how can we program a computer to find these anagrams? A first guess algorithm might be:
for each permutation p of characters in string s
...if p exists in the wordlist then print anagram
Luckily the C++ Standard Library contains a handy function called next_permutation(first, last) that takes a pair of iterators as arguments. The function changes the order of the elements in the range [first, last) to the next lexicographic permutation and returns true. If there is no next_permutation, it arranges the sequence to be the first permutation and returns false. For all permutations to be generated the starting sequence must be initially sorted in ascending order.

So, using this function we can easily generate all permutations of a word and check them against a word list.

Such a program might look like this:
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <algorithm>

using namespace std;

int main() 
{ 
    vector<string> dict;
    int word_count = 0;
    
    // Make sure words.txt is in your current directory.
    // Search the web for a suitable word list.
    ifstream in_file("words.txt");
    if (!in_file) {
        cerr << "word list not found" << endl;
        exit(1);
    }
    // read word list and store words in a vector
    string s;
    while (in_file >> s) {
        dict.push_back(s);
        word_count++;
    }
    
    cout << word_count << " words read" << endl;
    string str;
    while (true) {
        int count = 0;
        cout << "\nEnter letters (q to quit): ";
        cin >> str;
        if (str == "q") break;
        bool perms_remaining = true;
        sort(str.begin(), str.end()); // necessary so that next_permutation finds all permutations
        while (perms_remaining) {
            count++;
            vector<string>::const_iterator iter = find(dict.begin(), dict.end(), str);
            if (iter != dict.end())
                // we have found an anagram so print it
                cout << *iter << " ";
            perms_remaining = next_permutation(str.begin(), str.end());
        }
        cout << endl << count << " permutations" << endl;
    }
}


When I run this on my Core 2 Duo 2.1GHz PC it takes 21 seconds to find all 5 anagrams. There are noticeable pauses between each found word. Adding another letter to the input to make, say, integrals and the time rockets to 3 min 12 sec. So going from 8 characters to 9 characters results in a 9-fold increase in time. Yes folks this is an O(n!) algorithm, one of the worst performing possible.

There are n! permutations if the string has n unique characters, so for n = 8 there are 40320 permutations. For each of these the program scans the word list of 38000 words to find a match. Considering there are only a handful of anagrams for any given word, that means in effect the full list is being scanned every time. That makes 40320*38000 = 1,532,160,000 string comparisons. Now that's brute-force! That's where the 21 seconds goes. And that's for a relatively short word. Surely there must be a better way? Of course there is! See Part 2 to find out.

Friday, May 15, 2009

TTMath - a bignum library for C++

Came across this handy BigNum library for arbitraily large numbers in C++. All you need to do is include a single header and then declare a bignum type such as:
typedef ttmath::UInt<100> BigInt;
which creates a type that can hold unsigned integers between 0 and 2 ^ (32*100)-1. Which is absolutely huge. Actually you can go even bigger if the stack size on your OS permits. Yes, this library uses the stack for all calculations, there are no dynamic allocations. So now you can define a factorial function like so
BigInt factorial(BigInt n)
{
    if (n == 1 || n == 0)
        return 1;
    n *= factorial(n-1);
    return n;
}
and display all those fun factorials that we love so much. Did you know 35! = 10333147966386144929666651337523200000000 Fascinating eh!

Wednesday, April 29, 2009

Qt Part 3 - Configuration for Visual Studio 2008

Before the Nokia buyout Qt could only be integrated with Visual Studio if you purchased the Commercial License. Opensource users were limited to the MinGW toolset. That has now changed with the availability of the Visual Studio Add-In. This is a kind of cut-down version of the full integration that is still available in the Commercial edition, however from what I can see it does offer most of what you need, namely:
  • API Help integration with MSDN (so F1 context sensitive help works)
  • Wizards for Qt projects and skeleton apps
  • Qt Designer integration for forms and layout (although this runs as a separate app)
  • Automated build process (so it runs the Qt pre-processors then compiles and links the app)
Partner this with Whole Tomato Visual Assist X and you have a pretty nice Qt dev environment.
But before you can actually use Visual Studio you need to compile Qt from source so that you get the libs and DLLs that are compatible with VS. The best way to do this is to download the source Framework (not the prebuilt SDK). I think the only difference is that the framework does not contain Qt Creator. These steps assume you already have Visual Studio 2008 installed (any version except Express, which doesn't support add-ins), and that Qt is not already installed.
  1. Download and run qt-win-opensource-4.5.1-mingw.exe (the version number may have changed).
  2. Choose a destination folder, I chose D:\Qt\4.5.1
  3. Select the checkbox to download and install MinGW - it is needed for the build process.
  4. The actual archive unpacking can take quite a while, up to 20 mins or so.
  5. Add Qt to your system PATH - this means the bin folder, so D:\Qt\4.5.1\bin in my case.
  6. Add the system environment variable QTDIR=D:\Qt\4.5.1 (substitute your path)
  7. Reboot your computer for the changes to take effect.
  8. Run the Visual Studio 2008 Command Prompt (or open an ordinary cmd.exe and run vsvars32.bat from Microsoft Visual Studio 9.0\Common7\Tools folder). This will set the environment variables for the Microsoft compiler tools.
  9. cd to D:\Qt\4.5.1 (not the bin folder).
  10. Run the command: configure -platform win32-msvc2008
  11. Answer O for Opensource, and agree to the license.
  12. It will make a new qmake.exe, then build pro files for each library and app.
  13. Run nmake. This can take upwards of 5 hours on a P4 3GHz machine to give you some idea. Also, it will require around 10GB of temporary scratch space to build.
  14. Wait for hours for the compile to finish...it will build debug and release versions of the libraries and DLLs in the \lib folder, and also all the tools and examples.
  15. You can delete the *.obj and *.pdb files to claim back around 9GB of disk space. The easiest way to do this is to open a Command Prompt and cd to the Qt folder. Then enter del /s /q *.obj *.pdb
  16. That's it! If you're a completist you can now uninstall MinGW and also delete mingwm10.dll from the Qt \bin folder, as they are not needed. Confirm everything worked by running qtdemo.exe from \bin.
Now you need to download and install the Visual Studio Add-In. Just double-click the .exe and follow the wizard.
  1. Start Visual Studio 2008.
  2. Click Help -> Contents. This will start MSDN and perform an update of the help files to merge the Qt documentation. This can take quite a while. As in a big cup of coffee.
  3. In VS select Qt -> Qt Options. Click Add and browse to the installation folder (eg D:\Qt\4.5.1). Enter a version name, eg Qt-4.5.1. Click Ok.
All done! You now have an excellent VS 2008 dev environment for creating Qt apps. When you click File -> New -> Project you'll see a new entry for Qt4 Projects, along with some predefined templates. If you use Whole Tomato Visual Assist X (like I do, it's a necessity for any serious VS work) read this blog link on their site which describes how to configure Visual Assist to parse the Qt headers.

The screenshot above shows a Qt app being developed in VS. Notice the Qt menu and the test2.ui form file in Solution Explorer. Double-clicking the .ui file will open Qt Designer as an external app. Here you can create your GUI, then just save, head back to VS and compile. Job's a good 'un!

Monday, April 27, 2009

Qt Part 2 - Installation and MinGW

So I downloaded the Qt SDK 4.5.1 for Windows, a measly 167MB. It's packaged as an .exe which when run expands out to a folder of nearly 1GB. It set up some Start Menu items including the Qt Creator and the all-important Qt Demo which showcases the API features. This is actually pretty impressive, featuring a window which resizes the contents on the fly (including all text and images) and fancy transitions and animations.
I suspect it uses OpenGL to do this because when I ran QtDemo.exe through Dependency Walker it showed the following DLLs that were linked in:
As you can see QTOPENGL4.DLL is a linked DLL. Notice also the other dependencies - this is something to be aware of when deploying Qt apps. The above five QT*.DLLs total 15MB. There are many more that can end up being referenced depending on what your app does. You can try linking statically to reduce the DLL count but this has it's own issues (the main one being I couldn't get it to work!)
So...how do you actually create programs now that the SDK is installed? Well, by default you are expected to use the Qt Creator to create your programs and compile using the mingw C++ compiler, both of which are included in the SDK. Mingw is the Windows port of the famous Unix GCC compiler. Qt Creator looks good, I haven't had much chance to play with it yet, but nice to see it has built-in support for Perforce, Git and Subversion source control systems. It also has the integrated Qt Designer for visual forms design, code completion, syntax highlighting and lots of other goodies.
The other way to create apps is to use a standard text editor like Notepad++ to create the C++ code, and then compile manually using the command-line build tools. This is actually the best method when learning the basics of Qt because there is no hand-holding. It's pretty straightforward though, just cd to the folder containing the cpp files, and run
qmake -project
qmake
mingw32-make (default is debug build, for release add -f Makefile.Release)
This will create a debug (or release) folder containing your exe which you can run. Caveat - do all this using the Qt Command Prompt shortcut, which sets up the appropriate env variables. Of course you can add the vars to your system globally, they are referenced in qtenv.bat.
Running the exe you will notice the nice thing about C++ native code - almost instant startup compared to non-native managed languages. For small utility apps this is very important, less so for big apps where users don't mind a splash screen for a few seconds.
Now mingw is all well and good, but 99% of Windows developers use Visual Studio (yes I made that stat up but I'm sure it's not far off) and so in Part 3 I'll go over setting up Qt with VS2008 support.

Qt Part 1 - licenses and motivation

Qt is a cross-platform application and UI framework primarily aimed at C++ users (although plenty of other bindings exist). In plain English that means we can use it to create GUIs. It can do a ton of other stuff including networking, database access and OpenGL to name a few. It's been around since the early 90s and some of it's concepts pre-date the usage of "modern" C++ (unsurprising since the C++ standard was only ratified in 1998). The history of Qt is quite a read - it took a while for the product to gain momentum, but two events helped it along. First it was chosen by the European Space Agency in 1996 and then it was chosen as the toolkit upon which the fledgling KDE Linux desktop environment was built. From then it was upwards all the way to the present day where we are at v4.5.
In 2008 Trolltech (the original owner and creater of Qt) was bought by Nokia, the telecomms giant. Nokia changed the business plan for Qt. Originally the product was only available under two licenses - commercial (big bucks) or GPL (open source). GPL means any product created with Qt must also be open source. So the only way to create and sell a commercial product was to pay hundreds of dollars. This obviously limited adoption of Qt. Nokia decided that in order to spread the usage of Qt it was necessary to allow people to create commercial software either for free or at least very cheaply. In the end they chose the free option (phew!) and used the LGPL (Lesser GPL ) license. This means it now doesn't cost a penny to create commercial apps with Qt. Of course, you might want to stump up for Visual Studio Professional but that's a different story.
Anyway, next I'll see how easy or difficult it is to get Qt up and running...

Sunday, April 26, 2009

Using libcurl for simple http

In "Accelerated C++" the authors develop a function (find_urls) to extract URLs from a string. I tested it by downloading the home page of news.bbc.co.uk, saving it to disk and then using it as input to find_urls. The output was a list of all the URLs in the page. That's good, I thought but wouldn't it be nice if we could query the page live off the website without having to download it first?
So I went in search of a C/C++ library that would act as an http client. I came across libcurl - apparently a popular C library that does a lot more that http. A quick perusal of the docs reminded me how complex C APIs can be. Although there are examples on the libcurl site I decided to do what any programmer in 2009 now does - search StackOverflow to see if someone else has done it! And indeed they had. One of the answers contained a link to a blog site called Luckyspin.org, where the blogger had kindly put up a working example of how to download a webpage into a string using libcurl.
So, after a quick download of the libcurl development package and the Luckyspin sample I was off and running. Initially I tried to get libcurl to compile statically but couldn't get that to work, so I used the dll version. Unfortunately this adds about a meg of dlls that you need to lug around with you. But anyway, now I can do
libcurl_test http://new.bbc.co.uk | find_urls
Notice that the Luckyspin example did exactly what I wanted - it stored the web page in a string and then output it to stdout, meaning I could just pipe it into my find_urls program. I love it when I find stuff that works with no modification!

Sunday, April 19, 2009

Cheating at Bookworm Adventures

Bookworm Adventures is a word game where you have to make words out of a grid containing 16 random  letters. The longer the word, the bigger the damage you do to the current baddie. In essence, it's an anagram game - you need to jumble the letters up to make a legal word. I thought it would be fun as a first C++ project to create a program that outputs the longest legal word using the grid letters as input. Of course, me and my 7-year old son would only use this as a last resort (maybe).
This sort of thing can be done in 20 lines of Ruby. And probably any other scripting language. I guarantee my C++ version will be much longer! Still, it's a useful exercise because it will involve I/O streams and STL containers and algorithms. It's also small enough to be done in an evening (hopefully)!