14 thoughts on “Using cURL to Read the Contents of a Web Page Leave a comment

    1. I use file_get_contents too – but It’s the hosts disabling it bit that makes cURL so much more flexible. Also cURL has a whole stack of different options that allows you to grab just the parts you need, or fake different browsers etc.

    1. I’ve heard about this and had a play around with their demo’s but never used it for anything complex. It’s quite a cool idea, and definitely helpful for those sites that don’t have an official API. What sort of situations have you needed to use this in? It’d be interesting to see some real world examples.

    2. I generally use SimpleXML. It’s included with PHP5, so it’s fairly standard across servers.

      It turns an XML document into an object/array data structure. If you need to parse an HTML page, you can pass it through Tidy first.

  1. I scrape stock market data.

    I have used curl, but curl is a bit complicated. Also, it doesn’t work that well when a web site requires an extended dialog with the server, such as login and password and cookies, especially ASP.NET sites.

    I have also used biterscripting IS (Internet Sessions) . It also provides a command line interface for a web client. ( http://www.biterscripting.com/helppages_automatedinternet.html ). There are only a few commands to learn and they work really, really well when it comes to conducting an extended dialog with a web server, including logging in, form filling, exchanging cookies and setting the standard ASP.NET variables _VIEWSTATE, etc, …

    For web servers that don’t require a login, simple commands work.

    var string page
    cat “http://www.something.com/path/to/some/page.ext” > $page

    That gets the source for the page into a string variable without doing anything special.

    script SS_WebPageToCSV.txt page(“http://www.something.com/path/to/some/page.ext”)

    That extracts a table from a web page and puts it in CSV format.

    You need the IS (internet session) only when the web server requires that client establish explicit sessions. Other web pages are available with the simple cat/repro command.

  2. $document = new DOMDocument();
    @$document->loadHTML($html);
    $title = $document->getElementsByTagName(‘title’)->item(0)->nodeValue;

    Set $html, and you will have the title 🙂

  3. Hi,
    i also use html dom parser library but its not work each and every web site content grabbing curl is best.
    because i parse walmart.com for my demo application first i write html dom to grab info but shipping and other stuff not get..

    how to secure web api develop fast grab data .

  4. Just to note your regex needs updating (I copied and pasted yours and didn’t get any valid rss), after checking over I realised your regex had a space in after ‘yweather’, removing it make the example work for me:

    $regex = ”;

  5. I am searching for a solution for my wordpress website which comes with the visual composer plugin. Unfortunately this plugin uses file_get_contents() which is not supported by my webhosting company for security reasons however they do support cURL

    The support for this plugin s very poor and I am waiting for their answer for a week now. Without this plugin my WP theme can go to bin as it is responsible for creating the layout for my website.

    Is there anyone who could help me or at least point me towards what should have been done?

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *