If you are looking for cloud-based web scrapers or crawlers, you might have heard of both Scrapinghub and TeraCrawler.io.
While many products in this domain might look more or less the same, but the ideal use case might vary tremendously.
Here are some of the differences between the 2.
The main use case
TeraCrawler is ideal when you just need to download a large amount of data regularly and be able to set rules like crawling depth, URL patterns to download or block, and the types of files/documents you need to download.
Generally, what you will get as an output is a big fat file that you can download and process locally at your convenience.
Scrapinghub is ideal when you want to scrape data. It's ideal when you know exactly what you want and enjoy the crawling and scraping to be done on the cloud for final consumption. That way, Scrapinghub, as its name indicates, is focussed on the challenges of scraping data well.
Who is it for?
TeraCrawler is ideal for developers who would like to have control over how they scrape the data but dont want to deal with the setting up and monitoring of resources and pitfalls of the crawling of the data.
Scrapinghub is ideally for end customers who are not developers who just want the data in exportable in various formats like CSV, JSON, with the data already extracted. Or for developers who dont want to deal with both crawling as well as scraping and dont mind giving up the control.
TeraCrawler is in the business of crawling data efficiently, quickly, predictably, and at scale for developers who deal with the problem of large scale web crawling. Its a fully automated SAAS offering for both mid-market to enterprise-grade customers.
Scrapinghub's focus is on extracting data, producing a lot of tools, including the open sources framework Scrapy to automate the scraping of hard to get data.