Text Storage
Generator can be configured to store parsed text content in an S3 bucket for archival or external processing purposes.
This feature is controlled by the IMPORT__TEXT_OBJECT_STORE setting, which accepts a JSON object with the following schema:
| Field | Type | Default | Description |
|---|---|---|---|
| enabled | bool | False | Whether Generator should store text in an S3 bucket. |
| prefix | str | The path prefix to use when storing text in an S3 bucket. | |
| format | TextFormat | TextFormat.JSON | The format to use when serializing the text for storage. |
| include_interactions | bool | False | Whether to include interactions in the text. See the include_interactions query parameter of the /content/{content_id}/versions/{version}/text endpoint for more details. |
| s3_location | S3Location (optional) | None | The S3 location to use when storing text in an S3 bucket. |
Properties of S3Location:
| Field | Type | Description |
|---|---|---|
| bucket | str | The name of the S3 bucket where text should be stored. |
| region | str (optional) | AWS region name (e.g., ‘us-east-1’). If omitted, the AWS SDK will use the default configured region. |
| access_key_id | str (optional) | AWS access key ID. If omitted, credentials will be resolved via the standard AWS provider chain. |
| secret_access_key | str (optional) | AWS secret access key. Required if access_key_id is provided manually. |
| session_token | str (optional) | AWS session token for temporary security credentials. |
Values of TextFormat:
| Value |
|---|
| text |
| json |
| vtt |
By default, Generator automatically determines the S3 object path using a combination of the content’s identifiers (tenant ID, content ID, and version). To specify a custom path, set the object_store_text_path_override property on any import request.
When text storage is enabled, the results property of successful import jobs will include an additional property named text_object_store_uri. This property contains the S3 URI of the object where Generator stored the imported content’s text.
Examples
Default Path Generation
When no prefix is specified and no override on the import request, Generator uses the default path structure based on content identifiers. This approach organizes stored text files hierarchically by tenant, content, and version, making it easy to locate files for specific content items.
Setting Value:
{
"value": {
"enabled": true,
"prefix": "",
"format": "json",
"s3_location": {
"bucket": "my-text-bucket"
}
}
}
Result: Text is saved to s3://my-text-bucket/{tenant_id}/{content_id}/{version}/text.json, using the actual ID/version values from the content that was imported. For example, if importing content with tenant ID 123, content ID 456, and version 0, the file would be saved to s3://my-text-bucket/123/456/0/text.json.
Custom Prefix
Using a prefix allows you to organize all Generator-imported text files under a specific path within your S3 bucket. This is useful when you want to separate Generator’s text storage from other bucket contents, or when you need to apply bucket policies or lifecycle rules to a specific directory structure.
Setting Value:
{
"enabled": true,
"prefix": "text-resources/generator-imports",
"format": "json",
"s3_location": {
"bucket": "my-text-bucket"
}
}
Result: Text is saved to s3://my-text-bucket/text-resources/generator-imports/{tenant_id}/{content_id}/{version}/text.json. The prefix is prepended to the identifier-based path, so all Generator text files will be stored under the text-resources/generator-imports/ directory while still maintaining the hierarchical organization by tenant, content, and version.
Path Override in Import Request
For cases where you need complete control over the S3 object path for a specific import, you can use the object_store_text_path_override property in your import request. This override takes precedence over both the configured prefix and the automatic identifier-based path generation. This is particularly useful when you need to store text at a specific location for integration with external systems, or when you want to use a custom naming convention for a particular import.
Setting Value:
{
"enabled": true,
"prefix": "text-resources/generator-imports",
"format": "json",
"s3_location": {
"bucket": "my-text-bucket"
}
}
Import Request:
{
...
"object_store_text_path_override": "generator/text/file.json"
...
}
Result: Text is saved to s3://my-text-bucket/generator/text/file.json. When an override is specified, Generator uses exactly the path you provide, ignoring both the configured prefix (text-resources/generator-imports) and the automatic identifier-based path structure. The override path is relative to the bucket root, so you have full control over the final object location.