Website
Train your agent by crawling your website
Point your agent at any website and it automatically crawls and learns from the content.
Setting up the crawler
Go to Integrations > Website (or use Quick Start when creating an agent)
Enter your website URL (e.g.,
https://www.yourcompany.com)Optionally configure path filters
Click Connect to start crawling
Prerequisites
A publicly accessible website
The crawler respects
robots.txt
How it works
The crawler:
Starts at the URL you provide
Follows links to discover pages
Extracts text content from each page
Indexes the content for your agent to search
Crawl limits
Trial
100 pages
Paid plans
500 pages
Configuring path filters
Use include and exclude paths to control what gets crawled:
Include paths — Only crawl pages matching these paths
Example:
/help,/docs,/support
Exclude paths — Skip pages matching these paths
Example:
/blog,/careers,/pricing
This is useful for large websites where you only want your agent to learn from specific sections.
Sync frequency and updates
The crawler periodically re-crawls your site to pick up changes
You can manually trigger a re-crawl from the integration settings
New pages linked from existing pages are discovered automatically
Tips
Use path filters on large sites. If your site has thousands of pages, focus the crawler on help and documentation sections.
Crawl your help center. If your help center is on a subdomain (like help.yourcompany.com), enter that URL directly.
Combine with other sources. The website crawler is great for getting started quickly, but add help center articles, documents, and past tickets for comprehensive coverage.
Check what got crawled. After the crawl completes, browse the indexed pages in your Files tab to verify the right content was picked up.
Troubleshooting
Crawler not finding pages?
Make sure pages are publicly accessible (not behind a login)
Check that
robots.txtisn't blocking the crawlerVerify that pages are linked from the starting URL (orphan pages won't be found)
Too many irrelevant pages crawled?
Add exclude paths for sections you don't want (blog, careers, etc.)
Use include paths to restrict crawling to specific sections
Last updated