Skip to main content

Sources

Overview

Sources determine what material your assistant will use when constructing its responses. They allow you to define and produce the underlying documents that the assistant can cite. Good and relevant sources can drastically improve an assistant's responses.

Source types

Sources come in two types: crawl sources and upload sources.

Crawl sources can be pointed at various URLs and be configured with things like the max depth to crawl and URL patterns to included or exclude.

Upload sources allow you to upload HTML directly which can be useful if you don't have publicly available webpages to point to.

info

Upload sources must ingest documents via API upload in HTML format.

Status

Sources have a status associated with them to show the state of a crawl and document processing.

  • Finished - The source has finished crawling/processing successfully
  • In progress - The source has started crawling/processing
  • Queued - The source crawling/processing has been queued up and will move to in progress soon
  • Failed - The source crawling/processing has failed
info

Even though a job fails, some documents may still have been added/updated.

Re-crawling

You can initiate a new crawl in the case that the material has updated (e.g., you updated content on your help center), source settings have changed, or a job has failed.

Any documents that were previously crawled will be updated and any new pages detected by the crawler will be added as new documents.