How to Scrape Amazon Product Data

Amazon is one of the most valuable sources for product data. Prices, reviews, rankings, and availability change constantly, and tracking that manually is not an option at scale. The catch is that Amazon has aggressive bot detection, so a basic scraper will get blocked fast.

In this article, we will explore how to scrape Amazon product data without getting blocked.

Setting Up Your Scraper with Python and BeautifulSoup

You need two libraries: Requests to send HTTP requests and BeautifulSoup to parse the HTML. Install them with:

pip install requests beautifulsoup4

PYTHON

import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.com/dp/PRODUCT_ID"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9"
\}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

The User-Agent header is critical. Without it, Amazon flags the request as automated and returns a CAPTCHA instead of the product page.

Also Read: How to Use Proxies With Python Requests

What Data to Target and How to Extract It

The most useful data points are the title, price, rating, and review count.

PYTHON

title = soup.find("span", {"id": "productTitle"\}).get_text(strip=True)
price = soup.find("span", {"class": "a-price-whole"\}).get_text(strip=True)
rating = soup.find("span", {"class": "a-icon-alt"\}).get_text(strip=True)
reviews = soup.find("span", {"id": "acrCustomerReviewText"\}).get_text(strip=True)

Amazon's HTML structure changes periodically, so wrap each field in a try/except block to prevent the scraper from crashing when an element is missing:

PYTHON

try:
    title = soup.find("span", {"id": "productTitle"\}).get_text(strip=True)
except AttributeError:
    title = None

Use Python's built-in csv module to write the output to a file you can open in any spreadsheet tool or feed into a database.

Bypassing Amazon's Bot Detection with Rotating Proxies

Amazon tracks requests by IP. Too many requests from the same address can result in blocking or fake pages. Rotating residential proxies assign a different IP address to each request, so Amazon never sees enough traffic from a single source to trigger a block.

PYTHON

proxies = {
    "http": "http://username:password@proxy_host:port",
    "https": "http://username:password@proxy_host:port"
\}

response = requests.get(url, headers=headers, proxies=proxies)$

Residential proxies are the right choice for Amazon. Datacenter IPs are easy to flag, but residential IPs come from real ISPs and are nearly impossible to distinguish from normal traffic. Add a short random delay between requests to further reduce detection:

PYTHON

import time
import random

time.sleep(random.uniform(1, 3))

Also Read: How to Use Datacenter Proxies for SEO Monitoring

Final Thoughts

Python and BeautifulSoup handle the parsing side cleanly. The part where most scrapers fail is detection, and residential proxies are what closes that gap. Proxyon's residential proxies start at $1.75/GB with no subscription required. Deposit $5 and start scraping Amazon in minutes at Proxyon.

How to Scrape Amazon Product Data

Setting Up Your Scraper with Python and BeautifulSoup

What Data to Target and How to Extract It

Bypassing Amazon's Bot Detection with Rotating Proxies

Final Thoughts

Get back to building.