Website
Connect your agent with your website so that it can crawl your site pages and use them as knowledge.
You can have your agent link to specific URLs, and walk users through troubleshooting or recommend product pages.
Note: to set up automatic syncing of your website pages, this is only available on a paid plan upon request.
Setting up the crawler
Go to Integrations > Website (or use Quick Start when creating an agent)
Enter your website URL (e.g.,
https://www.yourcompany.com)Optionally configure path filters
Click Connect to start crawling
Prerequisites
A publicly accessible website
The crawler respects
robots.txt
How it works
The crawler:
Starts at the URL you provide
Follows links to discover pages
Extracts text content from each page
Indexes the content for your agent to search
Crawl limits
Trial
100 pages
Paid plans
500 pages
Configuring path filters
Use include and exclude paths to control what gets crawled:
Include paths — Only crawl pages matching these paths
Example:
/help,/docs,/support
Exclude paths — Skip pages matching these paths
Example:
/blog,/careers,/pricing
This is useful for large websites where you only want your agent to learn from specific sections.
Sync frequency and updates
The crawler periodically re-crawls your site to pick up changes
You can manually trigger a re-crawl from the integration settings
New pages linked from existing pages are discovered automatically
Tips
Use path filters on large sites. If your site has thousands of pages, focus the crawler on help and documentation sections.
Crawl your help center. If your help center is on a subdomain (like help.yourcompany.com), enter that URL directly.
Combine with other sources. The website crawler is great for getting started quickly, but add help center articles, documents, and past tickets for comprehensive coverage.
Check what got crawled. After the crawl completes, browse the indexed pages in your Files tab to verify the right content was picked up.
Troubleshooting
Crawler not finding pages?
Make sure pages are publicly accessible (not behind a login)
Check that
robots.txtisn't blocking the crawlerVerify that pages are linked from the starting URL (orphan pages won't be found)
Too many irrelevant pages crawled?
Add exclude paths for sections you don't want (blog, careers, etc.)
Use include paths to restrict crawling to specific sections
Last updated