All articles

Apify Web Scraper MCP Server: Step-by-Step Tutorial

Search APIs give you snippets. APIs give you structured data from services that offer them. But a huge amount of useful data — competitor pricing, product catalogs, job listings, review aggregations — lives on websites with no API at all.

That's the gap web scraping fills, and the Apify MCP server brings professional-grade scraping infrastructure into any AI agent workflow. Instead of writing custom scrapers, your agent picks from 2,000+ pre-built extractors, runs them on Apify's cloud, and gets structured results back.

This apify web scraper mcp server tutorial walks through the complete setup — account creation to your first scraping run to a multi-tool pipeline.

Prerequisites and Apify Account Setup

Create an Apify account

Sign up at console.apify.com. The free tier includes $5/month in platform credits, which covers hundreds of simple scraping runs during development.

Get your API token

Navigate to Settings → Integrations in the Apify console and copy your API token. It looks like: apify_api_xxxxxxxxxxxxxxxxxxxxxxxx

Verify Node.js

The MCP server runs via npx:

node --version   # v18 or higher
npx --version    # Should return a version number

If npx isn't available, install Node.js from nodejs.org.

Installing the Apify MCP Server

Claude Code configuration

Add the Apify server to your Claude Code settings file. For user-level access (~/.claude/settings.json):

{
  "mcpServers": {
    "apify": {
      "command": "npx",
      "args": ["-y", "@apify/mcp-server"],
      "env": {
        "APIFY_TOKEN": "apify_api_your-token-here"
      }
    }
  }
}

For environment variable reference (recommended for shared configs):

{
  "mcpServers": {
    "apify": {
      "command": "npx",
      "args": ["-y", "@apify/mcp-server"],
      "env": {
        "APIFY_TOKEN": "${APIFY_TOKEN}"
      }
    }
  }
}

Then in your shell profile:

export APIFY_TOKEN="apify_api_your-token-here"

OpenClaw configuration

mcp_servers:
  apify:
    command: npx
    args: ["-y", "@apify/mcp-server"]
    env:
      APIFY_TOKEN: "${APIFY_TOKEN}"

Adding to an existing config

If you already have other MCP servers configured (like Tavily), add Apify alongside them:

{
  "mcpServers": {
    "tavily": {
      "command": "npx",
      "args": ["-y", "tavily-mcp@latest"],
      "env": { "TAVILY_API_KEY": "${TAVILY_API_KEY}" }
    },
    "apify": {
      "command": "npx",
      "args": ["-y", "@apify/mcp-server"],
      "env": { "APIFY_TOKEN": "${APIFY_TOKEN}" }
    }
  }
}

Restart Claude Code after saving. MCP servers connect at session start.

For full details on managing multiple servers, see our Claude Code MCP setup guide.

Your First Scraping Run with an AI Agent

Start a new Claude Code or OpenClaw session and test with a simple extraction:

Scrape the homepage of https://example.com and return the page title 
and all heading text.

You should see a tool call to the Apify MCP server. The agent selects the generic web scraper actor, runs it against the URL, and returns structured content.

A more practical first run

Let's try something useful — extracting product data:

Scrape the top 10 products from https://books.toscrape.com/ 
and return the title, price, and rating for each.

The agent will:

  1. Select an appropriate Apify actor (likely the generic Web Scraper or a Cheerio-based actor)
  2. Configure it with the target URL and extraction instructions
  3. Execute the run on Apify's cloud infrastructure
  4. Return structured JSON with the requested data

The key thing to notice: your agent didn't write a scraper. It leveraged Apify's existing infrastructure — JavaScript rendering, proxy rotation, anti-bot handling — through a single MCP tool call. The scraping runs in Apify's cloud, not on your local machine.

If the test fails

Check in order:

  1. API token — Verify at console.apify.com that your token is valid
  2. Credits — Confirm you have available platform credits ($5/month free)
  3. npx — Run npx @apify/mcp-server --help to verify the package works
  4. Session restart — MCP servers connect at startup; restart after config changes

Choosing the Right Apify Actor for Your Task

Apify's actor library has 2,000+ pre-built scrapers. Your agent can browse and select actors, but knowing the key ones helps you guide it:

Generic Web Scraper

Handles most websites. Renders JavaScript, follows pagination, extracts data based on CSS selectors or natural language descriptions. Use this as your default when no site-specific actor exists.

Best for: Unknown sites, one-off extraction, pages with standard HTML structure.

Google Search Results Scraper

Extracts full SERP data — titles, URLs, snippets, People Also Ask, related searches. Returns raw Google results rather than pre-processed content.

Best for: SEO analysis, SERP monitoring, keyword research.

Amazon Product Scraper

Specialized extractor for Amazon product pages — prices, ratings, reviews, seller info, availability, product details.

Best for: Competitive pricing analysis, market research, product catalog monitoring.

LinkedIn Profile/Company Scraper

Extracts public profile and company data. Job titles, company history, skills, employee counts.

Best for: Sales prospecting enrichment, recruiting pipelines, competitive org analysis.

Website Content Crawler

Deep-crawls an entire site, following internal links and extracting all text content. Builds a comprehensive content map.

Best for: Competitive content audits, knowledge base extraction, SEO content analysis.

Telling your agent which actor to use

You can be explicit:

Use the Amazon Product Scraper actor to extract the top 20 results 
for "wireless earbuds" including prices, ratings, and review counts.

Or let the agent decide based on the task:

I need pricing data from Amazon for wireless earbuds. Get the top 20 results.

The agent typically selects the right actor for the job. If it picks a generic scraper when a specialized one exists, steer it explicitly.

Full details on available actors and capabilities at the Apify listing on ClawsMarket.

Building a Search-to-Scrape Pipeline

The most powerful apify ai agent scraping pattern combines search (finding the right pages) with scraping (extracting the data from them). This is where Apify pairs with Tavily.

The pattern

Research the pricing pages of the top 5 project management tools. 
Search for them first, then scrape each pricing page for plan names, 
prices, and included features. Output as a comparison table.

The agent chains two MCP servers:

  1. Tavily search — Finds "project management tool pricing" pages, returns URLs
  2. Apify scraping — Extracts structured pricing data from each URL
  3. Agent synthesis — Compiles the scraped data into a comparison table

Adding trend context

For market research workflows, layer in Google Trends data:

Research pricing for the top 5 project management tools (search + scrape),
then cross-reference with Google Trends interest data for each tool 
over the past 12 months. Which ones are gaining market interest?

Three MCP servers, one prompt, and the agent handles the coordination. Tavily discovers, Apify extracts, Google Trends adds market context.

Cost awareness

Each Apify actor run consumes platform credits. Simple page extractions cost fractions of a cent. Complex crawls across hundreds of pages cost more. Monitor your usage in the Apify console, especially during development when you're iterating on extraction parameters.

A practical guardrail: tell your agent to limit scope when testing:

Scrape only the first 5 results (not all pages) while we test this workflow.

Scale up once the extraction pattern is confirmed. For more on building complete agent pipelines with multiple tools, see our AI agent automation guide.

Frequently Asked Questions

How do I set up the Apify MCP server for AI agents?

Create an Apify account at console.apify.com and copy your API token from Settings → Integrations. Add a JSON config entry to your Claude Code settings file (~/.claude/settings.json) with the command npx @apify/mcp-server and your token as an environment variable. Restart Claude Code. The agent automatically discovers Apify's scraping tools and can use them for any web extraction task. The free tier includes $5/month in platform credits.

What can the Apify MCP server scrape?

Virtually any public website. Apify has 2,000+ pre-built actors for common sites (Amazon, Google, LinkedIn, social media, job boards, real estate sites) plus a generic Web Scraper that handles arbitrary pages. The platform manages JavaScript rendering, proxy rotation, anti-bot measures, and rate limiting. Your agent specifies what data to extract — the infrastructure handles the how. Results return as structured JSON that agents can process directly.

How much does Apify cost for AI agent scraping?

The free tier includes $5/month in platform credits. Simple page extractions cost fractions of a cent per page. Complex crawls across hundreds of pages with JavaScript rendering cost more. Paid plans start at $49/month for higher compute and storage. Most development and testing workflows fit within the free tier. Monitor usage in the Apify console, and set scope limits in your agent prompts during development to avoid accidentally crawling large sites.

Can I use Apify and Tavily together?

Yes, and this is the recommended pattern for research workflows. Tavily handles discovery (finding the right pages via search), Apify handles extraction (pulling structured data from those pages). Add both servers to your Claude Code or OpenClaw config and the agent chains them automatically. Search finds competitor pricing pages, Apify extracts the detailed pricing tables — a pattern that covers the full research-to-data pipeline without writing custom scrapers.

Which Apify actor should I use for web scraping?

Start with the generic Web Scraper for unknown sites — it handles JavaScript rendering, pagination, and standard HTML extraction. For common platforms, use specialized actors: Amazon Product Scraper for product data, Google Search Results Scraper for SERP analysis, LinkedIn scrapers for professional data, and Website Content Crawler for full-site extraction. Your agent can select actors automatically based on the task, but specifying the actor explicitly in your prompt gives more reliable results for site-specific scraping.