Sunday, April 26, 2009

Using libcurl for simple http

In "Accelerated C++" the authors develop a function (find_urls) to extract URLs from a string. I tested it by downloading the home page of news.bbc.co.uk, saving it to disk and then using it as input to find_urls. The output was a list of all the URLs in the page. That's good, I thought but wouldn't it be nice if we could query the page live off the website without having to download it first?
So I went in search of a C/C++ library that would act as an http client. I came across libcurl - apparently a popular C library that does a lot more that http. A quick perusal of the docs reminded me how complex C APIs can be. Although there are examples on the libcurl site I decided to do what any programmer in 2009 now does - search StackOverflow to see if someone else has done it! And indeed they had. One of the answers contained a link to a blog site called Luckyspin.org, where the blogger had kindly put up a working example of how to download a webpage into a string using libcurl.
So, after a quick download of the libcurl development package and the Luckyspin sample I was off and running. Initially I tried to get libcurl to compile statically but couldn't get that to work, so I used the dll version. Unfortunately this adds about a meg of dlls that you need to lug around with you. But anyway, now I can do
libcurl_test http://new.bbc.co.uk | find_urls
Notice that the Luckyspin example did exactly what I wanted - it stored the web page in a string and then output it to stdout, meaning I could just pipe it into my find_urls program. I love it when I find stuff that works with no modification!

No comments:

Post a Comment