The Instill Artifact component is a data component that allows users to manipulate and smart search files and data in the artifact store.
It can carry out the following tasks:
To use Artifact Component, you will need to set up the OpenAI API key for self-hosted deployment of Instill Core.
You can do this by setting the OPENAI_API_KEY
environment variable.
Please refer to configuring-the-embedding-feature
p.s. In Instill Cloud case, you do not need to set up the OpenAI API key.
#Release Stage
Alpha
#Configuration
The component definition and tasks are defined in the definition.json and tasks.json files respectively.
#Supported Tasks
#Upload File
Upload and process the files into chunks into Catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_UPLOAD_FILE |
Options (required) | options | object | Choose to upload the files to existing catalog or create a new catalog. |
The options
Object
Options
options
must fulfill one of the following schemas:
Existing Catalog
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | Catalog ID that you input in the Catalog. |
File | file | string | Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog. |
File Name | file-name | string | Name of the file, including the extension (e.g. example.pdf ). The length of this field is limited to 100 characters. |
Namespace | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Option | option | string | Must be "existing catalog" |
Create New Catalog
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | Catalog ID for new catalog you want to create. |
Description | description | string | Description of the catalog. |
File | file | string | Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog. |
File Name | file-name | string | Name of the file, including the extension (e.g. example.pdf ). The length of this field is limited to 100 characters. |
Namespace | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Option | option | string | Must be "create new catalog" |
Tags | tags | array | Tags for the catalog. |
Output | ID | Type | Description |
---|
File | file | object | Result of uploading file into catalog. |
Status | status | boolean | The status of trigger file processing, if succeeded, return true. |
Output Objects in Upload File
File
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | ID of the catalog that you upload files. |
Create Time | create-time | string | Creation time of the file in ISO 8601 format. |
File Name | file-name | string | Name of the file. |
Type | file-type | string | Type of the file. |
File UID | file-uid | string | Unique identifier of the file. |
Size | size | number | Size of the file in bytes. |
Update Time | update-time | string | Update time of the file in ISO 8601 format. |
#Upload Files
Upload and process the files into chunks into Catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_UPLOAD_FILES |
Options (required) | options | object | Choose to upload the files to existing catalog or create a new catalog. |
The options
Object
Options
options
must fulfill one of the following schemas:
Existing Catalog
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | Catalog ID that you input in the Catalog. |
File Names | file-names | array | Name of the file, including the extension (e.g. example.pdf ). The length of this field is limited to 100 characters. |
Files | files | array | Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog. |
Namespace | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Option | option | string | Must be "existing catalog" |
Create New Catalog
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | Catalog ID for new catalog you want to create. |
Description | description | string | Description of the catalog. |
File Names | file-names | array | Name of the file, including the extension (e.g. example.pdf ). The length of this field is limited to 100 characters. |
Files | files | array | Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog. |
Namespace | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Option | option | string | Must be "create new catalog" |
Tags | tags | array | Tags for the catalog. |
Output | ID | Type | Description |
---|
Files | files | array[object] | Files metadata in catalog. |
Status | status | boolean | The status of trigger file processing, if ALL succeeded, return true. |
Output Objects in Upload Files
Files
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | ID of the catalog that you upload files. |
Create Time | create-time | string | Creation time of the file in ISO 8601 format. |
File Name | file-name | string | Name of the file. |
Type | file-type | string | Type of the file. |
File UID | file-uid | string | Unique identifier of the file. |
Size | size | number | Size of the file in bytes. |
Update Time | update-time | string | Update time of the file in ISO 8601 format. |
get the metadata of the files in the catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_GET_FILES_METADATA |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Output | ID | Type | Description |
---|
Files | files | array[object] | Files metadata in catalog. |
Output Objects in Get Files Metadata
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | ID of the catalog that you upload files. |
Create Time | create-time | string | Creation time of the file in ISO 8601 format. |
File Name | file-name | string | Name of the file. |
Type | file-type | string | Type of the file. |
File UID | file-uid | string | Unique identifier of the file. |
Size | size | number | Size of the file in bytes. |
Update Time | update-time | string | Update time of the file in ISO 8601 format. |
get the metadata of the chunks from a file in the catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_GET_CHUNKS_METADATA |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
File UID (required) | file-uid | string | The unique identifier of the file. |
Output | ID | Type | Description |
---|
Chunks | chunks | array[object] | Chunks metadata of the file in catalog. |
Output Objects in Get Chunks Metadata
Field | Field ID | Type | Note |
---|
Chunk UID | chunk-uid | string | The unique identifier of the chunk. |
Create Time | create-time | string | The creation time of the chunk in ISO 8601 format. |
End Position | end-position | integer | The end position of the chunk in the file. |
File UID | original-file-uid | string | The unique identifier of the file. |
Retrievable | retrievable | boolean | The retrievable status of the chunk. |
Start Position | start-position | integer | The start position of the chunk in the file. |
Token Count | token-count | integer | The token count of the chunk. |
#Get File in Markdown
get the file content in markdown format.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_GET_FILE_IN_MARKDOWN |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
File UID (required) | file-uid | string | The unique identifier of the file. |
Output | ID | Type | Description |
---|
File UID | original-file-uid | string | The unique identifier of the file. |
Content | content | string | The content of the file in markdown format. |
Create Time | create-time | string | The creation time of the source file in ISO 8601 format. |
Update Time | update-time | string | The update time of the source file in ISO 8601 format. |
#Match File Status
Check if the specified file's processing status is done.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_MATCH_FILE_STATUS |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to check files' processing status in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
File UID (required) | file-uid | string | The unique identifier of the file. |
Output | ID | Type | Description |
---|
Status | succeeded | boolean | The status of the file processing, if succeeded, return true. |
#Retrieve
search the chunks in the catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_RETRIEVE |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Text Prompt (required) | text-prompt | string | The prompt string to search the chunks. |
Top K | top-k | integer | The number of top chunks to return. The range is from 1~20, and default is 5. |
Output | ID | Type | Description |
---|
Chunks | chunks | array[object] | Chunks data from smart search. |
Output Objects in Retrieve
Chunks
Field | Field ID | Type | Note |
---|
Chunk UID | chunk-uid | string | The unique identifier of the chunk. |
Similarity | similarity-score | number | The similarity score of the chunk. |
Source File Name | source-file-name | string | The name of the source file. |
Text Content | text-content | string | The text content of the chunk. |
#Ask
Reply the questions based on the files in the catalog.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_ASK |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to search files in the Catalog. |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Question (required) | question | string | The question to reply. |
Top K | top-k | integer | The number of top answers to return. The range is from 1~20, and default is 5. |
Output | ID | Type | Description |
---|
Answer | answer | string | Answers data from smart search. |
Chunks (optional) | chunks | array[object] | Chunks data to answer question. |
Output Objects in Ask
Chunks
Field | Field ID | Type | Note |
---|
Chunk UID | chunk-uid | string | The unique identifier of the chunk. |
Similarity | similarity-score | number | The similarity score of the chunk. |
Source File Name | source-file-name | string | The name of the source file. |
Text Content | text-content | string | The text content of the chunk. |
#Sync Files
This task synchronizes files from third-party storage to Instill Catalog. New files are uploaded, and updated files are overwritten based on third-party metadata. Files added through other channels, like the Artifact API or additional storage services, will not be removed. Currently, only Google Drive is supported as a third-party storage service.
Input | ID | Type | Description |
---|
Task ID (required) | task | string | TASK_SYNC_FILES |
Namespace (required) | namespace | string | Fill in your namespace, you can get namespace through the tab of switching namespace. |
Catalog ID (required) | catalog-id | string | Catalog ID that you input to synchronize files from third-party data storage to catalog. |
Third Party Files (required) | third-party-files | array[object] | File contents and metadata from third-part data storage. |
Input Objects in Sync Files
Third Party Files
File contents and metadata from third-part data storage.
Field | Field ID | Type | Note |
---|
Content | content | string | Base64 encoded content of the binary file without the data:[MIME_TYPE];base64, prefix. |
Created time | created-time | string | Time when the file was created. Format: YYYY-MM-DDTHH:MM:SSZ . |
ID | id | string | Unique ID of the file from third-party data storage. |
MD5 checksum | md5-checksum | string | MD5 checksum of the file. This reflects every change made to the file on the server, even those not visible to the user. |
MIME type | mime-type | string | MIME type of the file. |
Modified time | modified-time | string | Time when the file was last modified. Format: YYYY-MM-DDTHH:MM:SSZ . It will be used to check if the file has been updated. |
Name | name | string | Name of the file from third-party data storage. |
Size | size | integer | Size of the file in bytes. |
Version | version | integer | Version of the file. |
Web Content Link | web-content-link | string | Link for downloading the content of the file in a browser. |
Web View Link | web-view-link | string | Link for opening the file in a relevant third-party data storage editor or viewer in a browser. It will be used to check the source of the file. |
Output | ID | Type | Description |
---|
Uploaded Files (optional) | uploaded-files | array[object] | Files metadata in catalog. The metadata here is from Instill Artifact rather than third-party storage. |
Updated Files (optional) | updated_files | array[object] | Files that were updated. The metadata here is from Instill Artifact rather than third-party storage. |
Failure Files (optional) | failure-files | array[object] | Files that failed to upload or overwrite. The metadata here is from third-party storage. |
Error Messages (optional) | error-messages | array[string] | Error messages for files that failed to upload or overwrite. |
Status (optional) | status | boolean | The status of the triggering processing files, if succeeded, return true. |
Output Objects in Sync Files
Uploaded Files
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | ID of the catalog that you upload files. |
Create Time | create-time | string | Creation time of the file in ISO 8601 format. |
File Name | file-name | string | Name of the file. |
Type | file-type | string | Type of the file. |
File UID | file-uid | string | Unique identifier of the file. |
Size | size | number | Size of the file in bytes. |
Update Time | update-time | string | Update time of the file in ISO 8601 format. |
Updated Files
Field | Field ID | Type | Note |
---|
Catalog ID | catalog-id | string | ID of the catalog that you upload files. |
Create Time | create-time | string | Creation time of the file in ISO 8601 format. |
File Name | file-name | string | Name of the file. |
Type | file-type | string | Type of the file. |
File UID | file-uid | string | Unique identifier of the file. |
Size | size | number | Size of the file in bytes. |
Update Time | update-time | string | Update time of the file in ISO 8601 format. |
Failure Files
Field | Field ID | Type | Note |
---|
Content | content | string | Base64 encoded content of the binary file without the data:[MIME_TYPE];base64, prefix. |
Created time | created-time | string | Time when the file was created. Format: YYYY-MM-DDTHH:MM:SSZ . |
ID | id | string | Unique ID of the file from third-party data storage. |
MD5 checksum | md5-checksum | string | MD5 checksum of the file. This reflects every change made to the file on the server, even those not visible to the user. |
MIME type | mime-type | string | MIME type of the file. |
Modified time | modified-time | string | Time when the file was last modified. Format: YYYY-MM-DDTHH:MM:SSZ . It will be used to check if the file has been updated. |
Name | name | string | Name of the file from third-party data storage. |
Size | size | integer | Size of the file in bytes. |
Version | version | integer | Version of the file. |
Web Content Link | web-content-link | string | Link for downloading the content of the file in a browser. |
Web View Link | web-view-link | string | Link for opening the file in a relevant third-party data storage editor or viewer in a browser. It will be used to check the source of the file. |
#Example Recipes
Recipe for the Ask your Catalog pipeline.
catalog-id: ${variable.catalog_name}
namespace: ${variable.namespace}
question: ${variable.question}
description: The name of your catalog i.e. "instill-ai"
description: The namespace of your catalog i.e. "instill-ai"
description: The question to ask your catalog i.e. "What is Instill AI doing?", "What is Artifact?"
value: ${artifact-0.output.answer}
Sync files from Google Drive to Instill Catalog.
shared-link: ${variable.folder-link}
refresh-token: ${secret.refresh-token-gd}
namespace: ${variable.namespace}
catalog-id: ${variable.catalog}
third-party-files: ${read-folder.output.files}