Get Out of IP Blocks Without Changing Your Code

May 27th, 2020

IP blocks are the bane of web crawling projects for a while now. There are many approaches to preventing IP blocks and overcoming them that sort of work and takes a lot of effort.

If you are interested in doing it, the hard way here is a courtesy list.

Here is what you can do to prevent them.

Change User-Agent strings to something that web servers recognize.

Rotate them frequently

Rate limit your requests and make them irregular.

Pass cookies back

Pass other headers back as a browser does. Use the Google Chrome inspect tool to find what your browser sends and imitate that in your code.

Here is what you can do to overcome them once you are blocked, the hard way.

Restart your router when you get blocked on your local network

Use Amazon AWS AMIs and rotate them.

Use an AWS IP rotator library like this https://github.com/maxharlow/aws-ip-rotator

Use tor proxies

You can either go down the route prescribed above

do it with a single line of code.

For free.

Add this code to your scraper.

curl "http://api.proxiesapi.com/?key=API_KEY&render=true&url=https://example.com"

Or a version of this anyway. It's free forever for up to 1000 requests per month. Enough for small projects, and you get them working reliably fairly quickly.

You will need an AuthKey, which you can get at Proxiesapi.com by registering for free. Proxies API is a rotating proxy api.

Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.

With millions of high speed rotating proxies located all over the world
With our automatic IP rotation
With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions)
With our automatic CAPTCHA solving technology

Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.

A simple API can access the whole thing like below in any programming language.

You dont even have to take the pain of loading Puppeteer as we render Javascript behind the scenes, and you can just get the data and parse it any language like Node, Puppeteer, or PHP or using any framework like Scrapy or Nutch. In all these cases, you can just call the URL with render support like so.

curl "http://api.proxiesapi.com/?key=API_KEY&render=true&url=https://example.com"

If you further want to save time and not bother even create your own web crawling setup, you can check out our cloud-based web crawler teracrawler.io

TeraCrawler handles all the things above uses distributed servers with a large IP range along with millions of residential proxy IPs to make sure certain that you get your data.

Get our articles in your inbox