Scrapingbee
Overview
Section titled “Overview”ScrapingBee is a web scraping API that handles headless browsers and proxy rotation, allowing developers to extract HTML from any website in a single API call. With the ScrapingBee integration in SquadOS, your agents can scrape pages, extract structured data, and bypass anti-bot protections reliably and easily.
- Official website: https://www.scrapingbee.com/
- Composio documentation: docs.composio.dev/toolkits/scrapingbee
Authentication
Section titled “Authentication”This tool uses an API key (API_KEY) to connect.
You will need the following fields:
| Field | Required | Description |
|---|---|---|
api_key | Yes | Your private ScrapingBee API key, used to authenticate all requests. |
How to get credentials
Section titled “How to get credentials”- Go to dashboard.scrapingbee.com/account/register and create an account.
- Confirm your email to activate the account.
- Log in at dashboard.scrapingbee.com/account/login.
- Navigate to dashboard.scrapingbee.com/account/manage/api_key.
- Copy the API key displayed — this is the value to use in the
api_keyfield when connecting in SquadOS.
How to connect in SquadOS
Section titled “How to connect in SquadOS”- Go to Tools in the side menu (
/admin/tools). - Open the Available tab and search for
Scrapingbee. - Click the card to open the details and hit Connect.
- You’re taken to the secure connection page hosted by Composio, where you enter the API key obtained above.
- Once done, you’re sent back to SquadOS with the account connected and the tool available to agents. (Connection-flow details in Organization Tools.)
Available actions
Section titled “Available actions”Data Extraction
Section titled “Data Extraction”SCRAPINGBEE_DATA_EXTRACTION
Tool to extract structured data from a webpage using CSS or XPath selectors. Use ScrapingBee’s extract_rules feature.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The webpage URL to extract data from. |
wait | integer | No | Seconds to wait before extraction (for dynamic content). |
device | string | No | Emulate device type (desktop or mobile). |
api_key | string | Yes | Your ScrapingBee API key. |
extractor | object | Yes | JSON object defining fields to extract and their CSS/XPath selectors. For nested selectors, use object with selector and optional type keys. Misaligned or invalid selectors silently drop fields with no error — verify each selector matches the target DOM before large-scale use. |
javascript | boolean | No | Whether to render JavaScript before extraction. |
country_code | string | No | Two-letter country code for proxy geolocation (e.g., us, de). |
premium_proxy | boolean | No | Use premium proxy for higher reliability. |
block_resources | boolean | No | Block images, CSS, and resources to speed up extraction. |
forward_headers | object | No | Custom HTTP headers to forward to the target website. Provide as a dict, e.g., {'Accept-Language': 'en-US'}. Headers will be prefixed with Spb- and forwarded to the target. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error message if execution failed. |
successful | boolean | Yes | Whether the action executed successfully. |
HTML Fetch
Section titled “HTML Fetch”SCRAPINGBEE_HTML_FETCH
Tool to fetch HTML or screenshot via ScrapingBee HTML API. Use when you need page markup or image after optional JS rendering and resource controls. For anti-bot or CAPTCHA-protected sites (e.g., Cloudflare), combine render_js=true with premium_proxy=true or stealth_proxy=true to avoid blocks.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to scrape. |
wait | integer | No | Milliseconds to wait before returning content. |
retry | integer | No | Number of retries on request failure. |
device | string | No | Device type to emulate (desktop or mobile). |
cookies | string | No | Cookies to send in requests (HTTP header string). |
wait_for | string | No | CSS selector to wait for before returning content. |
block_ads | boolean | No | Block ads and tracking scripts. |
render_js | boolean | No | Render JavaScript before returning HTML. Required for client-side rendered pages where dynamic data is absent in raw HTML. |
js_snippet | string | No | JavaScript snippet to execute before returning content. |
screenshot | boolean | No | Return screenshot as base64-encoded PNG. |
js_scenario | string | No | JSON scenario for custom headless browser actions. |
country_code | string | No | Two-letter country code for geolocation (e.g., us). |
extract_rules | string | No | Extraction rules (CSS selector or JSONPath). |
premium_proxy | boolean | No | Use premium proxy for scraping. |
stealth_proxy | boolean | No | Use stealth (undetectable) proxy mode. |
block_resources | boolean | No | Block images and CSS resources on the page to speed up scraping. |
screenshot_selector | string | No | CSS selector of element to screenshot. |
screenshot_full_page | boolean | No | Capture full-page screenshot instead of only viewport. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error message if execution failed. |
successful | boolean | Yes | Whether the action executed successfully. |
Proxy Mode
Section titled “Proxy Mode”SCRAPINGBEE_SCRAPING_BEE_PROXY_MODE
Tool to fetch web content via ScrapingBee’s Proxy Mode. Use when you need to route requests through ScrapingBee proxies with optional JS rendering and resource blocking.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The target URL to scrape through ScrapingBee Proxy Mode. |
cookies | object | No | Cookies to send with the request as a key-value mapping. |
headers | object | No | Additional HTTP headers to forward to the target site. Each header will be prefixed with Spb- and forwarded when forward_headers is enabled. |
timeout | integer | No | Request timeout in milliseconds. |
block_ads | boolean | No | Block ads and tracking scripts to speed up scraping. |
render_js | boolean | No | Enable JavaScript rendering before returning content. |
session_id | integer | No | Session identifier (integer) to keep the same IP for multiple requests. Use the same number to maintain consistent IP across requests. |
js_scenario | string | No | Custom JavaScript scenario name for advanced interactions. |
country_code | string | No | Two-letter country code for geolocated proxy (e.g., us, fr). |
premium_proxy | boolean | No | Use premium proxies for higher reliability. |
stealth_proxy | boolean | No | Use stealth proxy mode for extra undetectability. |
block_resources | boolean | No | Block images and CSS resources to speed up scraping. Only relevant when render_js is enabled. |
forward_headers | boolean | No | Forward original request headers to the target site. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error message if execution failed. |
successful | boolean | Yes | Whether the action executed successfully. |
Stealth Proxy
Section titled “Stealth Proxy”SCRAPINGBEE_STEALTH_PROXY
Tool to perform stealth scraping via ScrapingBee’s Stealth Proxy mode. Use when you encounter anti-bot measures requiring undetectable requests.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL of the webpage to retrieve using stealth proxy. |
wait | integer | No | Wait time in milliseconds before returning the response. |
device | string | No | Device type to emulate during rendering. Options: desktop or mobile. |
cookies | string | No | Custom cookies in semicolon-separated format: name1=value1;name2=value2. |
js_render | boolean | No | Render JavaScript on the page before returning the response. |
country_code | string | No | Two-letter country code for proxy geolocation (e.g., us, de). |
extract_rules | string | No | Extraction rules in JSON string for structured data. |
premium_proxy | boolean | No | Use premium proxies for higher reliability. |
stealth_proxy | boolean | No | Enable stealth proxy mode. Use when the target site blocks bots. |
block_resources | boolean | No | Block images, styles, and fonts for faster loads. |
forward_headers | boolean | No | Forward original request headers from the browser. |
return_page_source | boolean | No | Return the raw page source instead of text. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error message if execution failed. |
successful | boolean | Yes | Whether the action executed successfully. |
Usage Stats
Section titled “Usage Stats”SCRAPINGBEE_USAGE_STATS
Tool to retrieve usage statistics for your ScrapingBee account. Use when you need to monitor remaining credits and request count.
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error message if execution failed. |
successful | boolean | Yes | Whether the action executed successfully. |