The Extract Integration is a powerful feature that allows you to automatically pull contextually relevant information from conversations based on a predetermined JSON schema. This integration enriches conversation metadata and facilitates more efficient data usage in customer support, transcriptions, and data analytics scenarios.
The integration empowers your AI bots to not only interact autonomously with users but also to extract key pieces of information from conversations. After a conversation ends or goes idle, the bot uses your provided JSON schema to extract data, consequently enriching the conversation metadata with structured information.
Creating an Extract Integration
Creating an extract integration establishes the foundation for automated data extraction from your conversations. The integration requires a custom JSON schema that defines what information to extract and how to structure it.
To create an extract integration, you need to provide basic information such as the integration name, description, and most importantly, the extraction schema that defines the data structure you want to capture.
POST /api/v1/integration/extract/create Content-Type: application/json { "name": "Customer Information Extractor", "description": "Extracts customer details from support conversations", "schema": { "customerName": { "type": "string", "description": "The customer's full name", "required": true }, "email": { "type": "string", "description": "The customer's email address", "required": true }, "issueType": { "type": "string", "description": "The type of issue reported" } } }http
Schema Design Considerations
When designing your extraction schema, consider the following:
- Field Types: Use appropriate types (string, number, boolean) for each field
- Required Fields: Mark essential fields as required to ensure data completeness
- Descriptions: Provide clear, detailed descriptions to guide the extraction process
- Conversation Flow: Design your bot's backstory and conversation flow to naturally collect the information specified in your schema
Optional Webhook Configuration
You can optionally configure a webhook URL in the request field to receive the
extracted data automatically. When specified, the integration will POST the extracted
data to your webhook endpoint after processing each conversation.
The webhook request will include:
- The extracted data according to your schema
- The conversation messages that were used for extraction
- An HMAC signature for request verification
Bot Filtering
When you specify a botId, the integration will only process conversations from
that specific bot. This allows you to have different extraction schemas for different
bots or use cases within your application.
Warning: The extraction schema structure should be carefully designed and tested before deployment. Inaccurate or inappropriate schemas could lead to incomplete or incorrect data extraction. Test your schema with various conversation scenarios to ensure it extracts the intended data accurately.
Listing Extracted Items
After your extract integration has processed conversations, you can retrieve the structured data items that were extracted using the item list endpoint. Each item represents the data extracted from a single conversation, organized according to the JSON schema you defined when creating the integration.
Extracted items are the primary output of the extract integration system. They contain the structured information your AI bot pulled from conversations, such as customer details, issue classifications, satisfaction scores, or any other fields you defined in your schema. Accessing these items programmatically enables you to build data pipelines, populate CRM systems, generate reports, and drive downstream business processes.
Each item in the response includes the extracted data object containing your
schema fields, the conversationId linking back to the source conversation,
and standard timestamps for auditing and synchronization purposes. The data
field structure mirrors your integration schema, making it straightforward to
map into your target data systems.
GET /api/v1/integration/extract/{extractIntegrationId}/item/listhttp
To paginate through large result sets, use the cursor-based pagination:
GET /api/v1/integration/extract/{extractIntegrationId}/item/list?take=50&cursor=<cursor>http
The response includes an items array with the extracted records and a
cursor value that can be used to fetch the next page of results. Continue
requesting pages until no cursor is returned, indicating you have reached
the end of the result set.
Use Cases: Common workflows include polling this endpoint periodically
to sync extracted data into a database, building dashboards that display
extraction results in real time, and auditing what data was captured from
specific conversations. You can cross-reference the conversationId with
the conversation API to retrieve the full conversation context alongside
the extracted data.
Authorization: Only the account that owns the extract integration can list its items. Requests from other accounts will be rejected with a not authorized error.
Exporting Extracted Items
The export endpoint retrieves extracted integration items with their data formatted for export workflows. This endpoint is similar to the list endpoint but returns data in a YAML-serializable format, making it particularly useful for integrations with tools that consume YAML, for bulk data exports, and for building data pipelines that process extracted information in structured text formats.
The export endpoint is designed for scenarios where you need to move extracted
data out of the platform into external systems. Each item's data field
supports YAML serialization via a toString() method, enabling seamless
integration with YAML-based configuration management tools, data warehouses,
and export pipelines that process text-based formats.
GET /api/v1/integration/extract/{extractIntegrationId}/item/exporthttp
The response follows the same structure as the list endpoint, with items
containing the extracted data alongside conversation references and timestamps.
The key difference is that the data field on each item can be serialized to
YAML format when converted to a string, enabling workflows that process
extracted data as human-readable structured text.
Use pagination parameters to batch through large result sets:
GET /api/v1/integration/extract/{extractIntegrationId}/item/export?take=100&cursor=<cursor>http
Use Cases: This endpoint is ideal for scheduled data export jobs that periodically transfer extracted data to external storage, for generating YAML-formatted reports from extraction results, and for feeding extracted data into configuration management systems that use YAML as their primary data format. It is also useful for debugging extraction schemas by reviewing the human-readable YAML representation of your extracted data.
Authorization: Only the account that owns the extract integration can export its items. Unauthorized access attempts are rejected with a not authorized error.
Deleting an Extract Integration
Permanently delete an extract integration when it's no longer needed. This operation removes the integration configuration but does not affect data that has already been extracted and stored in conversation metadata.
POST /api/v1/integration/extract/{extractIntegrationId}/delete Content-Type: application/json {}http
What Gets Deleted
When you delete an extract integration:
- The integration configuration is permanently removed
- The extraction schema definition is deleted
- Webhook configuration is removed
- Associated metrics tracking is stopped
What Remains
Deleting the integration does not affect:
- Previously extracted data in conversation metadata
- Historical metrics that were collected
- Conversations that were processed by the integration
The extracted data remains accessible through the conversation metadata and can still be queried and used by your applications even after the integration is deleted.
Warning: This operation is irreversible. Once deleted, you will need to recreate the integration with its schema configuration if you want to resume data extraction for new conversations.
Fetching an Extract Integration
Retrieving the details of a specific extract integration allows you to review its configuration, including the extraction schema, webhook settings, and associated bot information. This is essential for auditing your data extraction setup and troubleshooting any issues.
GET /api/v1/integration/extract/{extractIntegrationId}/fetchhttp
The response includes the complete integration configuration:
- Schema Definition: The full JSON schema used for data extraction
- Webhook Configuration: The request URL and settings for receiving extracted data
- Bot Association: The bot ID if the integration is linked to a specific bot
- Blueprint Linking: The blueprint ID if the integration is part of a larger blueprint
- Metadata: Any custom metadata attached to the integration
- Timestamps: Creation and last update times
This information is valuable when you need to verify your extraction configuration before processing conversations or when debugging extraction results that don't match your expectations.
Extraction Processing and Webhook Delivery
The extract integration uses a sophisticated background processing system to handle data extraction from conversations. This ensures that extraction happens reliably without blocking conversation flow or impacting user experience.
Automatic Extraction Triggers
Extraction is automatically triggered when:
- Conversation Ends: When a conversation is marked as completed
- Conversation Goes Idle: After a period of inactivity (configurable via the
triggersetting) - Manual Trigger: When you explicitly trigger extraction via the API
The system queues each extraction request and processes them asynchronously, ensuring reliable data capture even during high-traffic periods.
Extraction Process
When a conversation is queued for extraction:
- Message Retrieval: All messages from the conversation are retrieved and sorted chronologically
- Schema-Based Extraction: The AI model analyzes the conversation and extracts data according to your JSON schema
- Metadata Update: The extracted data is stored in the conversation's metadata under
integrations.extract.data - Metrics Collection: Numeric values from fields marked with
collect: trueare logged as metrics - Item Storage: The extracted data is stored in the ExtractIntegrationItem table for query and reporting
- Webhook Delivery: If configured, the extracted data is POSTed to your webhook URL
Webhook Request Format
When your webhook is invoked, it receives a POST request with the following structure:
{ "data": { // Your extracted data according to the schema "customerName": "John Doe", "email": "john@example.com", "orderAmount": 299.99 }, "conversation": { "messages": [ { "type": "user", "text": "I'd like to place an order" }, { "type": "bot", "text": "I'd be happy to help! May I have your name?" } // ... more messages ] } }json
Webhook Security
All webhook requests include an HMAC signature for verification:
- Header:
x-hub-signature: sha256=<hmac_hex_digest> - Algorithm: SHA-256
- Secret: The extract integration ID
- Payload: The JSON request body
To verify the signature, compute the HMAC of the request body using your integration ID as the secret and compare it with the signature in the header.
Webhook Retry Logic
If webhook delivery fails, the system automatically retries:
- Retry Attempts: Up to 5 retries
- Backoff Strategy: Exponential backoff with the formula:
min(86400, e^(2.5*n))seconds - Logging: All attempts are logged in the integration event logs
- Failure Notification: Final failures are recorded but do not block data extraction
Failed webhook deliveries do not prevent the data from being extracted and stored. The extracted data remains accessible in conversation metadata and through the integration items, even if webhook delivery fails.
Numeric Metrics Collection
Fields marked with collect: true in your schema trigger automatic metrics logging:
- Metric Type:
integration.extract[{integrationId}].{fieldName} - Value: The numeric value extracted from the conversation
- Relations: Linked to the integration, conversation, bot, and blueprint
- Aggregation: Available for analytics, charts, and trend analysis
This enables powerful analytics capabilities without requiring additional configuration or custom code.
Token Usage and Billing
Each extraction operation consumes API tokens:
- Model Used: Determined by your account settings and conversation context
- Token Count: Based on the conversation length and schema complexity
- Usage Recording: Tracked under the reason
conversation/extract - References: Linked to the specific conversation for audit purposes
Longer conversations and more complex schemas will consume more tokens. Monitor your usage through the usage API to track extraction costs.
Triggering Extraction on Historic Conversations
The trigger endpoint provides the ability to retroactively apply data extraction to existing conversations. This powerful feature is essential when you've just created an integration and want to extract data from past conversations, or when you've updated your extraction schema and need to reprocess conversations with the new configuration.
POST /api/v1/integration/extract/{extractIntegrationId}/trigger Content-Type: application/json { "sample": 20 }http
Use Cases
Initial Setup: After creating a new extract integration, trigger it on your most recent conversations to immediately populate your analytics and test that your extraction schema works as expected.
Schema Updates: When you've refined your extraction schema and want to apply the improvements to historical conversations. This ensures consistency across all your extracted data.
Data Recovery: If extraction failed for some conversations due to temporary issues, you can reprocess them to capture the missing data.
Analytics Refresh: Update your metrics and charts with newly extracted data after making schema changes that add or modify numeric fields marked for collection.
Processing Options
Sample Recent Conversations: Use the sample parameter (default: 20) to
specify how many of your most recent conversations to process. This is ideal
for quick testing or periodic data updates.
POST /api/v1/integration/extract/{extractIntegrationId}/trigger Content-Type: application/json { "sample": 50 }http
Specific Conversations: Provide an array of conversationIds to extract
data from specific conversations. This is useful when you need to reprocess
particular conversations after troubleshooting or schema adjustments.
POST /api/v1/integration/extract/{extractIntegrationId}/trigger Content-Type: application/json { "conversationIds": [ "conv_abc123", "conv_def456", "conv_ghi789" ] }http
How It Works
When you trigger extraction on historic conversations:
- Conversation Selection: The system identifies conversations based on your criteria (sample size or specific IDs)
- Bot Filtering: If your integration is linked to a specific bot, only conversations from that bot are processed
- Queue Processing: Each conversation is queued for extraction using the same processing pipeline as real-time extraction
- Metadata Update: The conversation metadata is updated with newly extracted data, and any previous extraction results are replaced
- Metrics Collection: If your schema includes numeric fields marked with
collect: true, new metrics are logged - Webhook Notification: If configured, your webhook receives the extracted data for each processed conversation
The response includes the number of conversations that were queued for processing:
{ "id": "ext_abc123", "triggered": 20 }json
Important Considerations
- Processing happens asynchronously in the background
- Large batches may take several minutes to complete
- The maximum sample size is 1000 conversations per request
- Each extraction consumes API tokens based on conversation length
- Webhook notifications are sent as conversations are processed
- Previous extraction results for these conversations will be overwritten
Note: If you're testing a new schema, start with a small sample (5-10 conversations) to verify the extraction works as expected before processing larger batches.
Updating an Extract Integration
Updating an extract integration allows you to refine your data extraction configuration as your needs evolve. You can modify the extraction schema, change webhook settings, adjust bot associations, or update trigger conditions.
POST /api/v1/integration/extract/{extractIntegrationId}/update Content-Type: application/json { "name": "Enhanced Customer Extractor", "schema": { "customerName": { "type": "string", "description": "The customer's full name", "required": true }, "email": { "type": "string", "description": "The customer's email address", "required": true }, "orderAmount": { "type": "number", "description": "The order amount in dollars", "collect": true } } }http
Common Update Scenarios
Refining Extraction Schema: As you analyze extracted data, you may discover additional fields to capture or existing fields that need clearer descriptions. Update your schema to improve extraction accuracy.
Adding Numeric Metrics: Add the collect: true property to numeric fields
in your schema to enable automatic metric tracking. This allows you to monitor
trends and analyze quantitative data from conversations.
Changing Webhook Configuration: Update the request field to change where
extracted data is sent, or remove it entirely if you no longer need webhook
notifications.
Adjusting Bot Filtering: Change the botId to apply the integration to
a different bot, or remove it to process conversations from all bots.
Modifying Trigger Conditions: Update the trigger setting to control when
extraction occurs (e.g., on conversation end or idle).
Testing Schema Changes
After updating your extraction schema, it's recommended to use the trigger endpoint to reprocess recent conversations and verify that the new schema extracts data as expected before it affects new conversations.
Note: Schema updates only affect future extractions. To apply the new schema to historical conversations, use the trigger endpoint to reprocess them.
Listing Extract Integrations
Retrieving a list of your extract integrations allows you to manage and monitor all your data extraction configurations in one place. This is particularly useful when you have multiple extraction schemas for different use cases or bots.
The list endpoint supports pagination and filtering capabilities, enabling you to efficiently navigate through large numbers of integrations. You can filter by blueprint or use metadata queries to find specific integrations.
GET /api/v1/integration/extract/listhttp
Pagination
Use the cursor parameter to paginate through results. The response includes
cursor information that you can use to fetch the next page of results. The take
parameter controls how many items to retrieve per page.
Filtering Options
- By Blueprint: Use the
blueprintIdquery parameter to retrieve only integrations associated with a specific blueprint - By Metadata: Use metadata query filters to find integrations with specific metadata properties
- Order: Control the sort order using the
orderparameter (asc or desc)
Each integration in the response includes the complete schema definition, webhook configuration, and associated bot information, allowing you to review and manage your extraction configurations effectively.