Skip to main content
Web Scraping

How to Scrape Amazon Product Data

Learn how to scrape Amazon product data with Python, BeautifulSoup, and rotating residential proxies without getting blocked.

ZenezenZenezen
·2 min read

AI Summary

Get a summary of this page using your preferred AI assistant.

How to Scrape Amazon Product Data

Amazon is one of the most valuable sources for product data. Prices, reviews, rankings, and availability change constantly, and tracking that manually is not an option at scale. The catch is that Amazon has aggressive bot detection, so a basic scraper will get blocked fast.

In this article, we will explore how to scrape Amazon product data without getting blocked.


Setting Up Your Scraper with Python and BeautifulSoup

PYTHON & BEAUTIFULSOUP SETUP

You need two libraries: Requests to send HTTP requests and BeautifulSoup to parse the HTML. Install them with:

Bash
1pip install requests beautifulsoup4
Python
1import requests
2from bs4 import BeautifulSoup
3
4url = "https://www.amazon.com/dp/PRODUCT_ID"
5
6headers = {
7    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
8    "Accept-Language": "en-US,en;q=0.9"
9}
10
11response = requests.get(url, headers=headers)
12soup = BeautifulSoup(response.text, "html.parser")

The User-Agent header is critical. Without it, Amazon flags the request as automated and returns a CAPTCHA instead of the product page.

Also Read: How to Use Proxies With Python Requests


What Data to Target and How to Extract It

TARGETING & EXTRACTING DATA

The most useful data points are the title, price, rating, and review count.

Python
1title = soup.find("span", {"id": "productTitle"}).get_text(strip=True)
2price = soup.find("span", {"class": "a-price-whole"}).get_text(strip=True)
3rating = soup.find("span", {"class": "a-icon-alt"}).get_text(strip=True)
4reviews = soup.find("span", {"id": "acrCustomerReviewText"}).get_text(strip=True)

Amazon's HTML structure changes periodically, so wrap each field in a try/except block to prevent the scraper from crashing when an element is missing:

Python
1try:
2    title = soup.find("span", {"id": "productTitle"}).get_text(strip=True)
3except AttributeError:
4    title = None

Use Python's built-in csv module to write the output to a file you can open in any spreadsheet tool or feed into a database.


Bypassing Amazon's Bot Detection with Rotating Proxies

BYPASSING AMAZON BOT DETECTION

Amazon tracks requests by IP. Too many requests from the same address can result in blocking or fake pages. Rotating residential proxies assign a different IP address to each request, so Amazon never sees enough traffic from a single source to trigger a block.

Python
1proxies = {
2    "http": "http://username:password@proxy_host:port",
3    "https": "http://username:password@proxy_host:port"
4}
5
6response = requests.get(url, headers=headers, proxies=proxies)

Residential proxies are the right choice for Amazon. Datacenter IPs are easy to flag, but residential IPs come from real ISPs and are nearly impossible to distinguish from normal traffic. Add a short random delay between requests to further reduce detection:

Python
1import time
2import random
3
4time.sleep(random.uniform(1, 3))

Also Read: How to Use Datacenter Proxies for SEO Monitoring


Final Thoughts

Python and BeautifulSoup handle the parsing side cleanly. The part where most scrapers fail is detection, and residential proxies are what closes that gap. Proxyon's residential proxies start at $1.75/GB with no subscription required. Deposit $5 and start scraping Amazon in minutes at Proxyon.

Related Posts

Everything you need to extract web data reliably.

Residential from $1.75/GB, datacenter from $1.50/IP, plus mobile, ISP, and IPv6. Pay-as-you-go. No subscriptions, no contracts. Deposit $5 and start today.

Get Started

Get 100MB free · No credit card required · Instant access