Scrapy comes with a middleware that makes rotating proxies a breeze, once you have a list of working proxies.
So here is how you implement it.
First, install the middleware.
pip install scrapy-rotating-proxies
Then in your settings.py, add the list of proxies like this.
ROTATING_PROXY_LIST = [
'Proxy_IP:port',
'Proxy_IP:port',
# ...
]
If you want more external control over the IPs, you can even load it from a file like this.
ROTATING_PROXY_LIST_PATH = 'listofproxies.txt'
Enable the middleware like this.
DOWNLOADER_MIDDLEWARES = {
# ...
'rotating_proxies.middlewares.RotatingProxyMiddleware': 800,
'rotating_proxies.middlewares.BanDetectionMiddleware': 800,
# ...
}
That's it!
Now all your requests will automatically be routed randomly between the proxies.
You will have to take care of refurbishing proxies that dont work though because the middleware automatically stops using proxies that dont work.
If you want a commercial solution that uses a pool of over 2 million rotating proxies, you can consider Proxies API. Even opt for a fully cloud-based crawling solution made by our team at Teracrawler.io, which can do the crawling for you on high speed distributed clusters with a built-in rotating proxy infrastructure.