May 26th, 2020
Lead Generation Through Web Scraping

Web scraping is one of the best ways to generate leads for your business quite easily.

Today we will look at a practical example of how to get leads from online business directories.

Here are 57 of them that you can scrape leads from

List of Online Directories

DirectoryURL
Yelphttp://www.yelp.com/
Thumbtackhttps://www.thumbtack.com/
Yellow Pageshttp://www.yellowpages.com/
Kudzuhttp://www.kudzu.com/
Better Business Bureauhttp://www.bbb.org/
Mantahttp://www.manta.com/
Judy's Bookhttp://www.judysbook.com/
CitySquareshttp://citysquares.com/
Yellow Pages Directoryhttps://www.yellowpagesgoesgreen.org
USdirectory.comhttp://www.usdirectory.com/
Herehttps://mapcreator.here.com
Angie's Listhttps://www.angieslist.com/
Local.comhttp://www.local.com/
MagicYellowhttp://www.magicyellow.com/
USCityhttps://uscity.net/
Yexthttp://www.yext.com/
The Business Journalshttp://businessdirectory.bizjournals.com/
Citysearchhttp://www.citysearch.com/
Favehttps://www.getfave.com/
BrownBookhttp://www.brownbook.net/
Merchant Circlehttps://www.merchantcircle.com
Kompasshttp://us.kompass.com/
Superpageshttp://www.superpages.com/
EZlocal.comhttp://ezlocal.com/
MyHuckleberryhttp://www.myhuckleberry.com/
CityVoterhttp://cityvoter.com/
Craigslisthttp://www.craigslist.org/about/sites#US
US Combo Directory Networkhttp://combodirectoryusa.info/
Insider Pageshttp://www.insiderpages.com/
Dexknowshttp://www.dexknows.com/
City-Datahttp://www.city-data.com/
Switchboardhttp://switchboard.com/
Bizwikihttp://www.bizwiki.com/
USA Citylinkhttp://www.usacitylink.com/
Best of the Webhttps://local.botw.org/
RadarFroghttp://radarfrog.gatehousemedia.com/
ALLPageshttp://www.allpages.com/
The Real Yellow Pageshttp://www.realpageslive.com/
Yellow.comhttp://www.yellow.com/
Voltahttp://www.volta.net/
Akamahttp://www.akama.com/
Nexporthttp://www.nexport.com/
JustDialhttp://us.justdial.com/
Valley Yellow Pageshttp://www.myyp.com/
OfficialUSA.comhttp://www.officialusa.com/
2FLhttp://www.2findlocal.com/
Seekitlocalhttp://www.seekitlocal.com/default.aspx
AmericanTownshttp://www.americantowns.com/
eLocalhttp://www.elocal.com/
CityGridhttp://www.citygrid.com/
Ziplocalhttp://ziplocal.com/
LocalPageshttp://www.localpages.com/
BizVoteshttp://www.bizvotes.com/
LocalFolder.comhttp://www.localfolder.com/
Zidsterhttp://www.zidster.com/
netHulkhttp://www.nethulk.com/
My Local Serviceshttp://www.mylocalservices.com/

That's a lot. So in this article lets learn how to scrape one of them, Yellow pages so that you can use the same techniques to scrape data from the others. Here we are imagining a scenario where we are looking to generate a list of Dentists that we can target.

BeautifulSoup will help us extract information, and we will retrieve crucial pieces of information from Yellow Pages.

To start with, this is the boilerplate code we need to get the Yellowpages.com search results page and set up BeautifulSoup to help us use CSS selectors to query the page for meaningful data.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.yellowpages.com/los-angeles-ca/dentists'

response=requests.get(url,headers=headers)


soup=BeautifulSoup(response.content,'lxml')

We are also passing the user agent headers to simulate a browser call, so we dont get blocked.

Now let's analyse the Yellow pages search results. This is how it looks.

And when we inspect the page, we find that each of the items HTML is encapsulated in a tag with the class v-card

We could just use this to break the HTML document into these cards which contain individual item information like this.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.yellowpages.com/los-angeles-ca/dentists'

response=requests.get(url,headers=headers)


soup=BeautifulSoup(response.content,'lxml')


for item in soup.select('.v-card'):
	try:
		print('----------------------------------------')
		print(item)

And when you run it.

python3 scrapeYellow.py

You can tell that the code is isolating the v-cards HTML

On further inspection, you can see that the name of the place always has the class business-name. So let's try and retrieve that.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.yellowpages.com/los-angeles-ca/dentists'

response=requests.get(url,headers=headers)


soup=BeautifulSoup(response.content,'lxml')


for item in soup.select('.v-card'):
	try:
		print('----------------------------------------')
		print(item.select('.business-name')[0].get_text())

	except Exception as e:
		#raise e
		print('')

That will get us the names.

Bingo!

Now let's get the other data pieces.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.yellowpages.com/los-angeles-ca/dentists'

response=requests.get(url,headers=headers)


soup=BeautifulSoup(response.content,'lxml')


for item in soup.select('.v-card'):
	try:
		print('----------------------------------------')
		print(item.select('.business-name')[0].get_text())
		print(item.select('.rating div')[0]['class'])
		print(item.select('.rating div span')[0].get_text())

		print(item.select('.phone')[0].get_text())
		print(item.select('.adr')[0].get_text())
		#print(item.select('.business-name')[0].get_text())

		print('----------------------------------------')
	except Exception as e:
		#raise e
		print('')

And when run.

Produces all the info we need including ratings, reviews, phone, address etc

If you want to use this in production and want to scale to thousands of links, then you will find that you will get IP blocked easily by Yellow Pages. In this scenario, using a rotating proxy service to rotate IPs is almost a must. You can use a service like Proxies API to route your calls through a pool of millions of residential proxies.

If you want to scale the crawling speed and dont want to set up you own infrastructure, you can use our Cloud base crawler crawltohell.com to easily crawl thousands of URLs at high speed from our network of crawlers.

Share this article:

Get our articles in your inbox

Dont miss our best tips/tricks/tutorials about Web Scraping
Only great content, we don’t share your email with third parties.
Icon