Pulling Data From Internet URLs in C#
By Rick Leinecker
I can't count the number of times that I've needed to retrieve data from an Internet site. There are several reasons for this. For starters, I developed an Internet filtering technology that analyzes images, and I needed thousands of images for testing and training. Pulling the images from the Internet in an automated way was the only practical way of obtaining the images I needed. I've also written crawlers (or spiders) that go from site to site. Sometimes I need an automated way of pulling web-based information, such as stock prices. Suffice to say that I've pulled data from Internet sites many times for many different reasons.
The .NET framework makes the task of requesting data from a URL simple. In days of old, I used raw sockets. The code had to connect with a remote server, create a well-formed HTTP request, send the request, retrieve the data and monitor for end-of-file conditions, and save to memory or a disk file—lots of code and lots of room for errors. This article, though, talks about the WebClient class. This is the simplest class that the .NET framework has to offer that makes it easy to download from Internet URLs. In later articles I'll talk about the WebRequest and WebResponse objects, and down the road we'll tackle Sockets.
Pulling Data With The WebClient Class
Pulling data from a URL with the WebClient class couldn't be easier. There are two different ways I use it: to save the data to a disk file, and to put the data into an in-memory buffer or string. Before you begin, though, you'll need to add a using statement for System.Net as follows.
using System.Net;
The next thing to note is that URLs must begin with "http://" if they are to be retrieved via the HTTP Internet protocol. I created a helper method that takes care of this detail. It is below.
// This helper method prepends "http://" to a URL if it isn't
// already there.
void PrependHTTP(ref string strURL)
{
if (strURL.Length < 7 ||
strURL.Substring(0, 7).ToUpper() != "HTTP://")
{
strURL = "http://" + strURL;
}
}
There are two methods that can be used to retrieve data. The first is the DownloadFile method which saves the retrieved data to a disk file. The second is the DownloadData method which places the retrieved data into a byte array. The following two examples show how to create a WebClient object and retrieve data.
Using the DownloadFile method:
WebClient wc = new WebClient();
wc.DownloadFile("http://www.rickleinecker.com/Default.htm",
"DiskFile.htm");
Using the DownloadData method:
WebClient wc = new WebClient();
byte[] data =
wc.DownloadData("http://www.rickleinecker.com/Default.htm");
If you want a string instead of a byte array, you can use the Encoding.ASCII.GetString method as follows. (Remember that you need a using statement for System.Text in order to use the Encoding.ASCII.GetString method.)
WebClient wc = new WebClient();
byte[] data =
wc.DownloadData("http://www.rickleinecker.com/Default.htm");
string strData = Encoding.ASCII.GetString(data);
There's a demonstration program that lets you specify a URL to download. You can choose to save it as a disk file, show it as a string, or display it as an image. You can see the application in action in the figure below.