Dataset Files

Files can be used to provide a source of records in your datasets. You can create files, attach them to datasets, and sync them to import records.

Create File

Creating a file is the first step to using it as a data source for your datasets. You can create a file by making a POST request to the following endpoint:

POST /api/v1/dataset/file/create Content-Type: application/json { "name": "My File", "description": "A description of my file", }

http

Uploading File Content

There are multiple ways to upload file content to be used as a data source for your datasets.

Upload via JSON URL or Data URL

You can upload a file by providing a HTTP URL or a data URL in a JSON request body. This method is suitable for smaller files (up to 4.5MB).

POST /api/v1/file/{fileId}/upload Content-Type: application/json { "file": "https://example.com/path/to/your/file.csv" }

http

or

POST /api/v1/file/{fileId}/upload Content-Type: application/json { "file": "data:text/csv;base64,SGVhZGVyMSxIZWFkZXIyCkRhdGExLERhdGEyCg==" }

http

Upload via Multipart/Form-Data

You can upload a file using multipart/form-data. This method is suitable for files up to 4.5MB.

POST /api/v1/file/{fileId}/upload Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW ------WebKitFormBoundary7MA4YWxkTrZu0gW Content-Disposition: form-data; name="file"; filename="file.csv" Content-Type: text/csv Header1,Header2 Data1,Data2 ------WebKitFormBoundary7MA4YWxkTrZu0gW--

http

Upload via Raw File Stream

You can upload a file by sending the raw file stream in the request body. This method is suitable for files up to 4.5MB.

POST /api/v1/file/{fileId}/upload Content-Type: text/csv Header1,Header2 Data1,Data2

http

Direct-to-Source Uploads

For larger files or more control over the upload process, you can obtain a pre-signed upload request by providing the file metadata in a JSON request body. You can then use the provided upload request to upload the file directly to the storage service.

POST /api/v1/file/{fileId}/upload Content-Type: application/json { "file": { "type": "text/csv", "size": 10485760, "name": "large_file.csv" } }

http

The response will include an uploadRequest object with the necessary details to perform the upload.

{ "id": "fileId", "uploadRequest": { "method": "PUT", "url": "https://direct-upload-url/path/to/upload", "headers": { "Content-Length": "10485760", "Content-Type": "text/csv", "Content-Disposition": "attachment; filename=large_file.csv" } } }

json

You can then use this uploadRequest to upload the file directly to the storage service.

PUT https://direct-upload-url/path/to/upload Content-Length: 10485760 Content-Type: text/csv Content-Disposition: attachment; filename=large_file.csv <file content>

http

Dataset files are the primary way to add content and knowledge to your datasets, enabling AI agents to access and reference specific documents, images, PDFs, text files, and other file types during conversations. Each file attached to a dataset is automatically processed, indexed, and made searchable, allowing the AI to retrieve relevant information when responding to user queries.

Listing Dataset Files

Retrieving the list of files attached to a dataset allows you to inventory all content within a knowledge base, review file metadata, and manage your dataset's content library. The list endpoint provides comprehensive information about each file including its name, description, visibility settings, and timestamps.

To retrieve the files associated with a dataset, send a GET request to the dataset's file list endpoint:

GET /api/v1/dataset/{datasetId}/file/list

http

Pagination

The endpoint supports cursor-based pagination for efficiently navigating large file collections:

GET /api/v1/dataset/{datasetId}/file/list?cursor=eyJpZCI6ImZpbGVfMTIzIn0&take=50

http

  • cursor: Pagination token from the previous response, enabling you to fetch the next page of results
  • take: Number of files to retrieve per page (adjust based on your needs)
  • order: Sort order, either asc (oldest first) or desc (newest first, default)

Filtering by Metadata

Filter files based on custom metadata fields using deep object notation:

GET /api/v1/dataset/{datasetId}/file/list?meta[category]=documentation&meta[language]=en

http

Metadata filtering enables flexible organization and retrieval based on your own categorization schemes, making it easy to find specific types of content within large datasets.

Response Format

The endpoint returns an array of file objects:

{ "items": [ { "id": "file_abc123", "name": "Product Documentation.pdf", "description": "Comprehensive product user guide", "visibility": "private", "meta": { "category": "documentation", "version": "2.1" }, "createdAt": "2025-01-10T08:30:00.000Z", "updatedAt": "2025-01-15T14:20:00.000Z" } ] }

json

File Visibility

Each file has a visibility setting that controls access:

  • private: Only accessible to the file owner and explicitly authorized users
  • protected: Accessible to users within the same organization or team
  • public: Publicly accessible (use with caution for sensitive content)

Streaming Response (JSONL)

For real-time processing of large file lists, request JSONL streaming format:

GET /api/v1/dataset/{datasetId}/file/list Accept: application/jsonl

http

Each line in the response is a separate JSON object:

{"type":"item","data":{"id":"file_abc123","name":"Document 1.pdf",...}} {"type":"item","data":{"id":"file_def456","name":"Document 2.pdf",...}}

jsonl

This format is ideal for processing large file lists incrementally without waiting for the entire response.

Important Notes:

  • Only files attached to datasets you own are returned
  • File processing status is not included in the list response; check individual file details for processing state
  • Deleted files are automatically removed from the list
  • The list reflects the current state of file attachments through the DatasetFileAttachment relationship
  • File metadata is flexible and can store arbitrary key-value pairs for custom organization

Attach Dataset File

Add a file to a dataset by creating an attachment between them. Specify the type of attachment.

POST /api/v1/dataset/{datasetId}/file/{fileId}/attach Content-Type: application/json {}

http

Detach Dataset File

Remove a file from a dataset by deleting the attachment between them. You can pass an optional parameter to also delete all records associated with the file in the dataset.

POST /api/v1/dataset/{datasetId}/file/{fileId}/detach Content-Type: application/json { "deleteRecords": true }

http

Warning: This will permanently delete all records associated with the file in the dataset.

Sync a File to a Dataset

Files are not automatically synced to datasets when they are attached or updated. This is to give you control over when the data is imported and to avoid unnecessary processing.

You can trigger a sync of a file to a dataset by making a POST request to the following endpoint:

POST /api/v1/dataset/{datasetId}/file/{fileId}/sync Content-Type: application/json {}

http

The response will contain the ID of the file that was synced. The processing of the file will happen asynchronously, and you can monitor the progress the dataset event log.