Extract Integration

The Extract Integration is a powerful feature that allows you to automatically pull contextually relevant information from conversations based on a predetermined JSON schema. This integration enriches conversation metadata and facilitates more efficient data usage in customer support, transcriptions, and data analytics scenarios.

The integration empowers your AI bots to not only interact autonomously with users but also to extract key pieces of information from conversations. After a conversation ends or goes idle, the bot uses your provided JSON schema to extract data, consequently enriching the conversation metadata with structured information.

Creating an Extract Integration

Creating an extract integration establishes the foundation for automated data extraction from your conversations. The integration requires a custom JSON schema that defines what information to extract and how to structure it.

To create an extract integration, you need to provide basic information such as the integration name, description, and most importantly, the extraction schema that defines the data structure you want to capture.


POST /api/v1/integration/extract/create
Content-Type: application/json

{
  "name": "Customer Information Extractor",
  "description": "Extracts customer details from support conversations",
  "schema": {
    "customerName": {
      "type": "string",
      "description": "The customer's full name",
      "required": true
    },
    "email": {
      "type": "string",
      "description": "The customer's email address",
      "required": true
    },
    "issueType": {
      "type": "string",
      "description": "The type of issue reported"
    }
  }
}
http

Schema Design Considerations

When designing your extraction schema, consider the following:

Field Types: Use appropriate types (string, number, boolean) for each field
Required Fields: Mark essential fields as required to ensure data completeness
Descriptions: Provide clear, detailed descriptions to guide the extraction process
Conversation Flow: Design your bot's backstory and conversation flow to naturally collect the information specified in your schema

Optional Webhook Configuration

You can optionally configure a webhook URL in the request field to receive the extracted data automatically. When specified, the integration will POST the extracted data to your webhook endpoint after processing each conversation.

The webhook request will include:

The extracted data according to your schema
The conversation messages that were used for extraction
An HMAC signature for request verification

Bot Filtering

When you specify a botId, the integration will only process conversations from that specific bot. This allows you to have different extraction schemas for different bots or use cases within your application.

Warning: The extraction schema structure should be carefully designed and tested before deployment. Inaccurate or inappropriate schemas could lead to incomplete or incorrect data extraction. Test your schema with various conversation scenarios to ensure it extracts the intended data accurately.

Deleting an Extract Integration

Permanently delete an extract integration when it's no longer needed. This operation removes the integration configuration but does not affect data that has already been extracted and stored in conversation metadata.


POST /api/v1/integration/extract/{extractIntegrationId}/delete
Content-Type: application/json

{}
http

What Gets Deleted

When you delete an extract integration:

The integration configuration is permanently removed
The extraction schema definition is deleted
Webhook configuration is removed
Associated metrics tracking is stopped

What Remains

Deleting the integration does not affect:

Previously extracted data in conversation metadata
Historical metrics that were collected
Conversations that were processed by the integration

The extracted data remains accessible through the conversation metadata and can still be queried and used by your applications even after the integration is deleted.

Warning: This operation is irreversible. Once deleted, you will need to recreate the integration with its schema configuration if you want to resume data extraction for new conversations.

Fetching an Extract Integration

Retrieving the details of a specific extract integration allows you to review its configuration, including the extraction schema, webhook settings, and associated bot information. This is essential for auditing your data extraction setup and troubleshooting any issues.


GET /api/v1/integration/extract/{extractIntegrationId}/fetch
http

The response includes the complete integration configuration:

Schema Definition: The full JSON schema used for data extraction
Webhook Configuration: The request URL and settings for receiving extracted data
Bot Association: The bot ID if the integration is linked to a specific bot
Blueprint Linking: The blueprint ID if the integration is part of a larger blueprint
Metadata: Any custom metadata attached to the integration
Timestamps: Creation and last update times

This information is valuable when you need to verify your extraction configuration before processing conversations or when debugging extraction results that don't match your expectations.

Extraction Processing and Webhook Delivery

The extract integration uses a sophisticated background processing system to handle data extraction from conversations. This ensures that extraction happens reliably without blocking conversation flow or impacting user experience.

Automatic Extraction Triggers

Extraction is automatically triggered when:

Conversation Ends: When a conversation is marked as completed
Conversation Goes Idle: After a period of inactivity (configurable via the trigger setting)
Manual Trigger: When you explicitly trigger extraction via the API

The system queues each extraction request and processes them asynchronously, ensuring reliable data capture even during high-traffic periods.

Extraction Process

When a conversation is queued for extraction:

Message Retrieval: All messages from the conversation are retrieved and sorted chronologically
Schema-Based Extraction: The AI model analyzes the conversation and extracts data according to your JSON schema
Metadata Update: The extracted data is stored in the conversation's metadata under integrations.extract.data
Metrics Collection: Numeric values from fields marked with collect: true are logged as metrics
Item Storage: The extracted data is stored in the ExtractIntegrationItem table for query and reporting
Webhook Delivery: If configured, the extracted data is POSTed to your webhook URL

Webhook Request Format

When your webhook is invoked, it receives a POST request with the following structure:


{
  "data": {
    // Your extracted data according to the schema
    "customerName": "John Doe",
    "email": "john@example.com",
    "orderAmount": 299.99
  },
  "conversation": {
    "messages": [
      {
        "type": "user",
        "text": "I'd like to place an order"
      },
      {
        "type": "bot",
        "text": "I'd be happy to help! May I have your name?"
      }
      // ... more messages
    ]
  }
}
json

Webhook Security

All webhook requests include an HMAC signature for verification:

Header: x-hub-signature: sha256=<hmac_hex_digest>
Algorithm: SHA-256
Secret: The extract integration ID
Payload: The JSON request body

To verify the signature, compute the HMAC of the request body using your integration ID as the secret and compare it with the signature in the header.

Webhook Retry Logic

If webhook delivery fails, the system automatically retries:

Retry Attempts: Up to 5 retries
Backoff Strategy: Exponential backoff with the formula: min(86400, e^(2.5*n)) seconds
Logging: All attempts are logged in the integration event logs
Failure Notification: Final failures are recorded but do not block data extraction

Failed webhook deliveries do not prevent the data from being extracted and stored. The extracted data remains accessible in conversation metadata and through the integration items, even if webhook delivery fails.

Numeric Metrics Collection

Fields marked with collect: true in your schema trigger automatic metrics logging:

Metric Type: integration.extract[{integrationId}].{fieldName}
Value: The numeric value extracted from the conversation
Relations: Linked to the integration, conversation, bot, and blueprint
Aggregation: Available for analytics, charts, and trend analysis

This enables powerful analytics capabilities without requiring additional configuration or custom code.

Token Usage and Billing

Each extraction operation consumes API tokens:

Model Used: Determined by your account settings and conversation context
Token Count: Based on the conversation length and schema complexity
Usage Recording: Tracked under the reason conversation/extract
References: Linked to the specific conversation for audit purposes

Longer conversations and more complex schemas will consume more tokens. Monitor your usage through the usage API to track extraction costs.

Triggering Extraction on Historic Conversations

The trigger endpoint provides the ability to retroactively apply data extraction to existing conversations. This powerful feature is essential when you've just created an integration and want to extract data from past conversations, or when you've updated your extraction schema and need to reprocess conversations with the new configuration.


POST /api/v1/integration/extract/{extractIntegrationId}/trigger
Content-Type: application/json

{
  "sample": 20
}
http

Use Cases

Initial Setup: After creating a new extract integration, trigger it on your most recent conversations to immediately populate your analytics and test that your extraction schema works as expected.

Schema Updates: When you've refined your extraction schema and want to apply the improvements to historical conversations. This ensures consistency across all your extracted data.

Data Recovery: If extraction failed for some conversations due to temporary issues, you can reprocess them to capture the missing data.

Analytics Refresh: Update your metrics and charts with newly extracted data after making schema changes that add or modify numeric fields marked for collection.

Processing Options

Sample Recent Conversations: Use the sample parameter (default: 20) to specify how many of your most recent conversations to process. This is ideal for quick testing or periodic data updates.


POST /api/v1/integration/extract/{extractIntegrationId}/trigger
Content-Type: application/json

{
  "sample": 50
}
http

Specific Conversations: Provide an array of conversationIds to extract data from specific conversations. This is useful when you need to reprocess particular conversations after troubleshooting or schema adjustments.


POST /api/v1/integration/extract/{extractIntegrationId}/trigger
Content-Type: application/json

{
  "conversationIds": [
    "conv_abc123",
    "conv_def456",
    "conv_ghi789"
  ]
}
http

How It Works

When you trigger extraction on historic conversations:

Conversation Selection: The system identifies conversations based on your criteria (sample size or specific IDs)
Bot Filtering: If your integration is linked to a specific bot, only conversations from that bot are processed
Queue Processing: Each conversation is queued for extraction using the same processing pipeline as real-time extraction
Metadata Update: The conversation metadata is updated with newly extracted data, and any previous extraction results are replaced
Metrics Collection: If your schema includes numeric fields marked with collect: true, new metrics are logged
Webhook Notification: If configured, your webhook receives the extracted data for each processed conversation

The response includes the number of conversations that were queued for processing:


{
  "id": "ext_abc123",
  "triggered": 20
}
json

Important Considerations

Processing happens asynchronously in the background
Large batches may take several minutes to complete
The maximum sample size is 1000 conversations per request
Each extraction consumes API tokens based on conversation length
Webhook notifications are sent as conversations are processed
Previous extraction results for these conversations will be overwritten

Note: If you're testing a new schema, start with a small sample (5-10 conversations) to verify the extraction works as expected before processing larger batches.

Updating an Extract Integration

Updating an extract integration allows you to refine your data extraction configuration as your needs evolve. You can modify the extraction schema, change webhook settings, adjust bot associations, or update trigger conditions.


POST /api/v1/integration/extract/{extractIntegrationId}/update
Content-Type: application/json

{
  "name": "Enhanced Customer Extractor",
  "schema": {
    "customerName": {
      "type": "string",
      "description": "The customer's full name",
      "required": true
    },
    "email": {
      "type": "string",
      "description": "The customer's email address",
      "required": true
    },
    "orderAmount": {
      "type": "number",
      "description": "The order amount in dollars",
      "collect": true
    }
  }
}
http

Common Update Scenarios

Refining Extraction Schema: As you analyze extracted data, you may discover additional fields to capture or existing fields that need clearer descriptions. Update your schema to improve extraction accuracy.

Adding Numeric Metrics: Add the collect: true property to numeric fields in your schema to enable automatic metric tracking. This allows you to monitor trends and analyze quantitative data from conversations.

Changing Webhook Configuration: Update the request field to change where extracted data is sent, or remove it entirely if you no longer need webhook notifications.

Adjusting Bot Filtering: Change the botId to apply the integration to a different bot, or remove it to process conversations from all bots.

Modifying Trigger Conditions: Update the trigger setting to control when extraction occurs (e.g., on conversation end or idle).

Testing Schema Changes

After updating your extraction schema, it's recommended to use the trigger endpoint to reprocess recent conversations and verify that the new schema extracts data as expected before it affects new conversations.

Note: Schema updates only affect future extractions. To apply the new schema to historical conversations, use the trigger endpoint to reprocess them.

Listing Extract Integrations

Retrieving a list of your extract integrations allows you to manage and monitor all your data extraction configurations in one place. This is particularly useful when you have multiple extraction schemas for different use cases or bots.

The list endpoint supports pagination and filtering capabilities, enabling you to efficiently navigate through large numbers of integrations. You can filter by blueprint or use metadata queries to find specific integrations.


GET /api/v1/integration/extract/list
http

Pagination

Use the cursor parameter to paginate through results. The response includes cursor information that you can use to fetch the next page of results. The take parameter controls how many items to retrieve per page.

Filtering Options

By Blueprint: Use the blueprintId query parameter to retrieve only integrations associated with a specific blueprint
By Metadata: Use metadata query filters to find integrations with specific metadata properties
Order: Control the sort order using the order parameter (asc or desc)

Each integration in the response includes the complete schema definition, webhook configuration, and associated bot information, allowing you to review and manage your extraction configurations effectively.