Search1API's Deepcrawl API enables asynchronous website crawling with flexible link discovery options, allowing developers to process entire websites systematically through a simple task-based interface at just 20 credits per request.
Search1API's Deepcrawl API revolutionizes how developers collect web content for large language models. This powerful asynchronous crawling solution enables systematic processing of entire websites with minimal configuration, making it the perfect tool for building comprehensive knowledge bases for Retrieval-Augmented Generation (RAG) systems. At just 20 credits per request, Deepcrawl transforms how AI applications access and utilize web-based information.
Like all Search1API endpoints, you'll need to authenticate using your Bearer token:
Authorization: Bearer your_api_key_here
Deepcrawl operates on a simple yet powerful asynchronous model:
Initiate a crawl task with your target URL
Receive a task ID for tracking
Check status until completion (typically within 1 minute)
Access the comprehensive crawl results
The API offers two flexible crawling modes to match your specific needs:
Sitemap mode (default): Processes only links defined in the website's sitemap.xml
All mode: Discovers and crawls all findable links throughout the website
Here's how to start a crawl:
POST https://api.search1api.com/deepcrawl
{
"url": "https://search1api.com",
"type": "sitemap"
}
The API responds with a task ID for tracking:
{
"taskId": "abc123xyz",
"status": "queued"
}
You can then check the status using:
GET https://api.search1api.com/deepcrawl/status/{taskId}
Which returns the current status:
{
"taskId": "abc123xyz",
"status": "processing",
"message": "Crawling in progress"
}
Once complete (typically within a minute), you'll receive the full results of your crawl task.
Deepcrawl handles resource-intensive crawling tasks in the background, letting your application continue working while the API does the heavy lifting. Results are typically available within just 1 minute.
Choose between targeted sitemap crawling for efficiency or comprehensive link discovery for completeness. This flexibility lets you balance between speed and thoroughness based on your specific needs.
Rather than handling individual pages, Deepcrawl systematically processes entire websites, ensuring your knowledge base is comprehensive and up-to-date.
The intuitive task-based interface makes it easy to initiate, monitor, and manage multiple crawl operations with minimal code and configuration.
Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. Instead of relying solely on the model's training data, RAG systems retrieve relevant information from a knowledge base before generating responses, dramatically improving accuracy and relevance.
Deepcrawl is the ideal solution for creating comprehensive, up-to-date knowledge bases for RAG systems:
Comprehensive Content Collection: Capture complete website content in a single operation
Structured Data Organization: Content is organized logically based on site structure
Efficient Processing: Asynchronous design handles large websites without overwhelming your application
Rapid Implementation: From API call to usable knowledge base in minutes, not hours or days
When to use Sitemap Mode:
For well-organized websites with comprehensive sitemaps
When you need only the most important site content
For faster, more efficient crawls
When targeting specific sections defined in the sitemap
When to use All Mode:
For websites without sitemaps or with incomplete sitemaps
When you need absolutely all available content
For creating exhaustive knowledge bases
When discovering hidden or unlisted content is important
import requests
import time
def create_knowledge_base(website_url, crawl_type="sitemap"):
# Step 1: Initialize the crawl
headers = {
'Authorization': 'Bearer your_api_key_here',
'Content-Type': 'application/json'
}
data = {
'url': website_url,
'type': crawl_type
}
# Start the crawl task
response = requests.post(
'https://api.search1api.com/deepcrawl',
headers=headers,
json=data
)
task_data = response.json()
task_id = task_data['taskId']
# Step 2: Check status until complete
while True:
status_response = requests.get(
f'https://api.search1api.com/deepcrawl/status/{task_id}',
headers=headers
)
status_data = status_response.json()
if status_data['status'] == 'completed':
# Process the results for your RAG system
return status_data['results']
elif status_data['status'] == 'failed':
raise Exception(f"Crawl failed: {status_data['message']}")
# Wait before checking again (typically ready within 1 minute)
time.sleep(10)
Implement appropriate retry logic and error handling for rare cases when a crawl might take longer than expected or encounter temporary issues with the target website.
Deepcrawl stands out because it:
Works asynchronously, freeing your application from waiting for results
Processes complete websites in a single operation
Delivers results quickly, typically within a minute
Offers flexible crawling strategies to match your specific needs
Integrates seamlessly with RAG implementations
Visit our API documentation to start building powerful knowledge bases for your AI applications today. Transform how your large language models access and utilize web content!
Whether you're enhancing a chatbot, building an AI research assistant, or creating a domain-specific knowledge system, Deepcrawl API provides the foundation for more accurate, relevant, and valuable AI-generated content.
Powerful search API service that helps you build better applications with advanced search capabilities.
© 2025 SuperAgents, LLC. All rights reserved.
Made with AI 🤖