Home/ Journal Bens Projects Binary Moon Archives About Ben Gillbanks Subscribe to Binary Moon Updates

Subscribe to Binary Moon Website Updates, it's Free and Easy to Stay in Touch

Search Binary Moon

Search Binary Moon

Using cURL to Read the Contents of a Web Page

Recently I wrote about how to use the Yahoo! weather api with WordPress and in the comments I was asked how to use it without relying on WordPress. The answer - is cURL.

According to Wikipedia the name cURL comes from "Client for URLs" and it is essentially a command line interface for a web client. This means that you can access web content through a script on your site. This is most often used by websites when they access web apis such as Twitter, Flickr, or as in this case, the Yahoo! weather api.

Note: There's actually loads of different commands and settings for cURL but we are only interested in a few. If you want to check them all out then you can view the docs on php.net.

Below is the code we will be using:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $file);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
$result = curl_exec($ch);
curl_close($ch);

Broken down we have:

  • $ch = curl_init(); intiate the curl object
  • curl_setopt($ch, CURLOPT_URL, $file); specify the file or url to load
  • curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); tell it to return the raw content
  • curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']); Simulate the http referer
  • $result = curl_exec($ch); perform the cURL request
  • curl_close($ch); close the connection

A bit of rejigging from the original WordPress Yahoo! post and you end up with:

<?php
function bm_getWeather ($code = '', $temp = 'c') {
	$file = 'http://weather.yahooapis.com/forecastrss?p=' . $code . '&u=' . $temp;

	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $file);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($ch, CURLOPT_REFERER, $_SERVER['REQUEST_URI']);
	$result = curl_exec($ch);
	curl_close($ch);

	$output = array (
		'temperature' => bm_getWeatherProperties('temp', $result),
		'weather' => bm_getWeatherProperties('text', $result),
		'weather_code' => bm_getWeatherProperties('code', $result),
		'class' => 'weatherIcon-' . bm_getWeatherProperties('code', $result),
	);

	return $output;
}

function bm_getWeatherProperties ($needle, $data) {
	$regex = '<yweather :condition.*' . $needle . '="(.*?)".*/>';
	preg_match($regex, $data, $matches);
	return $matches[1];
}

8 Responses to “Using cURL to Read the Contents of a Web Page” Leave a reply ›

  • Or you can use file_get_contents() for basic GET requests. http://php.net/manual/en/function.file-get-contents.php

    It's easier, and often faster than cURL, though some hosts disable it.

    • Profile

      I use file_get_contents too - but It's the hosts disabling it bit that makes cURL so much more flexible. Also cURL has a whole stack of different options that allows you to grab just the parts you need, or fake different browsers etc.

  • Read bit on curl in Smashing Magazine , This is cool as well.
    nb : Love the website :)

  • Another great tool is the PHP Simple HTML DOM Parser. It lets you select portions of the page your scraping using jQuery-style selectors.

    This library has helped me countless numbers of times.

    • Profile

      I've heard about this and had a play around with their demo's but never used it for anything complex. It's quite a cool idea, and definitely helpful for those sites that don't have an official API. What sort of situations have you needed to use this in? It'd be interesting to see some real world examples.

    • I generally use SimpleXML. It's included with PHP5, so it's fairly standard across servers.

      It turns an XML document into an object/array data structure. If you need to parse an HTML page, you can pass it through Tidy first.

  • I scrape stock market data.

    I have used curl, but curl is a bit complicated. Also, it doesn't work that well when a web site requires an extended dialog with the server, such as login and password and cookies, especially ASP.NET sites.

    I have also used biterscripting IS (Internet Sessions) . It also provides a command line interface for a web client. ( http://www.biterscripting.com/..._automatedinternet.html ). There are only a few commands to learn and they work really, really well when it comes to conducting an extended dialog with a web server, including logging in, form filling, exchanging cookies and setting the standard ASP.NET variables _VIEWSTATE, etc, ...

    For web servers that don't require a login, simple commands work.

    var string page
    cat "http://www.something.com/path/to/some/page.ext" > $page

    That gets the source for the page into a string variable without doing anything special.

    script SS_WebPageToCSV.txt page("http://www.something.com/path/to/some/page.ext")

    That extracts a table from a web page and puts it in CSV format.

    You need the IS (internet session) only when the web server requires that client establish explicit sessions. Other web pages are available with the simple cat/repro command.

Leave a Reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

About me

About BenMy name is Ben Gillbanks. I'm a lover of Video Games, WordPress, Web Development and everything in between.

I have been working on the internet since 1998, and working with computers even longer. I am a hardcore Nintendo fanboy and have owned most of their consoles at one stage or another.

Read more about me on my about page.


Follow Me


Random Link-outs

The Binary Network links to all my websites
bengillbanks.co.uk - Ben Gillbanks

Ben Gillbanks

All my websites under 1 roof, the easiest way to find out what I do

Pro Theme Design - premium WordPress themes

Pro Theme Design

Premium WordPress themes by web design pros (erm... that includes me)

Binary Joy - gaming news and reviews

Binary Joy

Gaming news and reviews

Binary Sun - play free online games

Binary Sun

Play and download free and paid games (many made by me)

Gaming Angel - download and play games online

Gaming Angel

Stacks of shareware games, free to try and cheap to buy