How to Scrape Bing Search Results with PHP


There are three things if you do the scraping programmatically.

  1. Where to Request (you must know the URL or URLs of targeting website)
  2. How to Handle the Response
  3. Where to Save (into database or posting of other website)

Simple scrape request

Let’s start with a request

I am going to use the file_get_contents function to make a request. In this example, we will go to scrape the example.com website heading tag h1 text i.e.”Example Domain“.

<?php

$response= file_get_contents('http://www.example.com/');
echo $response;

?>

The same can be achieved by cURL.

Below is cURL request example

<?php

        // create curl resource
        $ch = curl_init();

        // set url
        curl_setopt($ch, CURLOPT_URL, "example.com");

        //return the transfer as a string
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

        // $output contains the output string
        $response = curl_exec($ch);

        // close curl resource to free up system resources
        curl_close($ch);     

        echo $response;

?>

Handling Scrape Response

Next is handling response. Now whatever method you use to make a request your response look like this

$response = "<!doctype html> <html> <head> <title>Example Domain</title> <meta charset="utf-8" /> <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <style type="text/css"> body { background-color: #f0f0f2; margin: 0; padding: 0; font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; } div { width: 600px; margin: 5em auto; padding: 2em; background-color: #fdfdff; border-radius: 0.5em; box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02); } a:link, a:visited { color: #38488f; text-decoration: none; } @media (max-width: 700px) { div { margin: 0 auto; width: auto; } } </style> </head> <body> <div> <h1>Example Domain</h1> <p>This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.</p> <p><a href="https://www.iana.org/domains/example">More information...</a></p> </div> </body> </html>";

Now, to fetch heading h1 from the string we need to implement string handling tricks. Finding the position of <h1> and the position </h1> tag to get the length of the heading. Using the substring function you can fetch the heading h1 “Example Domain“.

<?php

$html= file_get_contents('http://www.example.com/');
//echo htmlspecialchars($html);

$start = stripos($html, '<h1>');
$end = stripos($html, '</h1>', $offset = $start);
$length = $end - $start;
$output = substr($html, $start, $length);
echo $output;

?>

Oh! 🙁 this is a very tired method. We know that an HTML document is a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree.

example

HTML > body > div > h1

We need something which can understand this path or XPATH to get the result fast. Let start with some DOM parser. The simplest DOM parser is PHP Simple HTML DOM Parser.

<?php

include('./simplehtmldom/simple_html_dom.php');

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

$element = $html->find('h1');
echo $element[0]->innertext."<br>"; ;


?>

You can download it from simplehtmldom.sourceforge.io.

Now I am going to use simplehtmldom library to scrape bing result

  • Include library path
  • Set the url to fetch (e.g. https://www.bing.com/search?q=hello)
  • Fetch all li with class b_algo class one by one using loop.
  • Each li have h2 tag.
    • Extract plain text of h2 tag for title
    • Find a tag and extract href property for link
  • Each li have div tag too with b_caption class
    • find paragraph p tag in div
    • and extract plaintext of paragraph too
  • Display the result

<?php

include('./simplehtmldom/simple_html_dom.php');
$url = "https://www.bing.com/search?q=".urlencode("hello");
$html = file_get_html($url);
// Find all list item
foreach($html->find('li.b_algo') as $element)
{
    $heading = $element->find('h2');
    $title = $heading[0]->innertext; // with innerhtml
    $title = $heading[0]->plaintext; // only plain text

    // finding a in h2 tag
    $e = $heading[0]->find('a'); 
    $link = $e[0]->href ;

    // find description
    $desc = $element->find('div.b_caption');
    // find p in div
    $d = $desc[0]->find('p'); 
    $dd = $d[0]->plaintext;

    echo "<b>Title: </b>".$title ."<br/><b>Link: </b>". $link."<br/><b>Description: <br/></b>$dd<hr/> ";
}

?>

In this way to get the bing search result using PHP.

Wait…

sometimes there is no output with this code! why?

Remember that running automated search query is not allowed by search engines. They easily identifies the automated queries, query comes from headless browser and ban the ip’s for the short, long or permanent.

If you want to scrape something than first go for official Bing Search API to scrape the data.

cheat sheet

  • Try to set the header
  • give some time interval to scrape multiple request