Instill Artifact

The Instill Artifact component is a data component that allows users to manipulate and smart search files and data in the artifact store. It can carry out the following tasks:

To use Artifact Component, you will need to set up the OpenAI API key for self-hosted deployment of Instill Core. You can do this by setting the OPENAI_API_KEY environment variable. Please refer to configuring-the-embedding-feature p.s. In Instill Cloud case, you do not need to set up the OpenAI API key.

#Release Stage

Alpha

#Configuration

The component definition and tasks are defined in the definition.json and tasks.json files respectively.

#Supported Tasks

#Upload File

Upload and process the files into chunks into Catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_UPLOAD_FILE
Options (required)optionsobjectChoose to upload the files to existing catalog or create a new catalog.
The options Object

Options

options must fulfill one of the following schemas:

Existing Catalog
FieldField IDTypeNote
Catalog IDcatalog-idstringCatalog ID that you input in the Catalog.
FilefilestringBase64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog.
File Namefile-namestringName of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters.
NamespacenamespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
OptionoptionstringMust be "existing catalog"
Create New Catalog
FieldField IDTypeNote
Catalog IDcatalog-idstringCatalog ID for new catalog you want to create.
DescriptiondescriptionstringDescription of the catalog.
FilefilestringBase64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog.
File Namefile-namestringName of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters.
NamespacenamespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
OptionoptionstringMust be "create new catalog"
TagstagsarrayTags for the catalog.
OutputIDTypeDescription
FilefileobjectResult of uploading file into catalog.
StatusstatusbooleanThe status of trigger file processing, if succeeded, return true.
Output Objects in Upload File

File

FieldField IDTypeNote
Catalog IDcatalog-idstringID of the catalog that you upload files.
Create Timecreate-timestringCreation time of the file in ISO 8601 format.
File Namefile-namestringName of the file.
Typefile-typestringType of the file.
File UIDfile-uidstringUnique identifier of the file.
SizesizenumberSize of the file in bytes.
Update Timeupdate-timestringUpdate time of the file in ISO 8601 format.

#Upload Files

Upload and process the files into chunks into Catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_UPLOAD_FILES
Options (required)optionsobjectChoose to upload the files to existing catalog or create a new catalog.
The options Object

Options

options must fulfill one of the following schemas:

Existing Catalog
FieldField IDTypeNote
Catalog IDcatalog-idstringCatalog ID that you input in the Catalog.
File Namesfile-namesarrayName of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters.
FilesfilesarrayBase64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog.
NamespacenamespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
OptionoptionstringMust be "existing catalog"
Create New Catalog
FieldField IDTypeNote
Catalog IDcatalog-idstringCatalog ID for new catalog you want to create.
DescriptiondescriptionstringDescription of the catalog.
File Namesfile-namesarrayName of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters.
FilesfilesarrayBase64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog.
NamespacenamespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
OptionoptionstringMust be "create new catalog"
TagstagsarrayTags for the catalog.
OutputIDTypeDescription
Filesfilesarray[object]Files metadata in catalog.
StatusstatusbooleanThe status of trigger file processing, if ALL succeeded, return true.
Output Objects in Upload Files

Files

FieldField IDTypeNote
Catalog IDcatalog-idstringID of the catalog that you upload files.
Create Timecreate-timestringCreation time of the file in ISO 8601 format.
File Namefile-namestringName of the file.
Typefile-typestringType of the file.
File UIDfile-uidstringUnique identifier of the file.
SizesizenumberSize of the file in bytes.
Update Timeupdate-timestringUpdate time of the file in ISO 8601 format.

#Get Files Metadata

get the metadata of the files in the catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_GET_FILES_METADATA
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
OutputIDTypeDescription
Filesfilesarray[object]Files metadata in catalog.
Output Objects in Get Files Metadata

Files

FieldField IDTypeNote
Catalog IDcatalog-idstringID of the catalog that you upload files.
Create Timecreate-timestringCreation time of the file in ISO 8601 format.
File Namefile-namestringName of the file.
Typefile-typestringType of the file.
File UIDfile-uidstringUnique identifier of the file.
SizesizenumberSize of the file in bytes.
Update Timeupdate-timestringUpdate time of the file in ISO 8601 format.

#Get Chunks Metadata

get the metadata of the chunks from a file in the catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_GET_CHUNKS_METADATA
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
File UID (required)file-uidstringThe unique identifier of the file.
OutputIDTypeDescription
Chunkschunksarray[object]Chunks metadata of the file in catalog.
Output Objects in Get Chunks Metadata

Chunks

FieldField IDTypeNote
Chunk UIDchunk-uidstringThe unique identifier of the chunk.
Create Timecreate-timestringThe creation time of the chunk in ISO 8601 format.
End Positionend-positionintegerThe end position of the chunk in the file.
File UIDoriginal-file-uidstringThe unique identifier of the file.
RetrievableretrievablebooleanThe retrievable status of the chunk.
Start Positionstart-positionintegerThe start position of the chunk in the file.
Token Counttoken-countintegerThe token count of the chunk.

#Get File in Markdown

get the file content in markdown format.

InputIDTypeDescription
Task ID (required)taskstringTASK_GET_FILE_IN_MARKDOWN
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
File UID (required)file-uidstringThe unique identifier of the file.
OutputIDTypeDescription
File UIDoriginal-file-uidstringThe unique identifier of the file.
ContentcontentstringThe content of the file in markdown format.
Create Timecreate-timestringThe creation time of the source file in ISO 8601 format.
Update Timeupdate-timestringThe update time of the source file in ISO 8601 format.

#Match File Status

Check if the specified file's processing status is done.

InputIDTypeDescription
Task ID (required)taskstringTASK_MATCH_FILE_STATUS
Catalog ID (required)catalog-idstringCatalog ID that you input to check files' processing status in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
File UID (required)file-uidstringThe unique identifier of the file.
OutputIDTypeDescription
StatussucceededbooleanThe status of the file processing, if succeeded, return true.

#Retrieve

search the chunks in the catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_RETRIEVE
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
Text Prompt (required)text-promptstringThe prompt string to search the chunks.
Top Ktop-kintegerThe number of top chunks to return. The range is from 1~20, and default is 5.
OutputIDTypeDescription
Chunkschunksarray[object]Chunks data from smart search.
Output Objects in Retrieve

Chunks

FieldField IDTypeNote
Chunk UIDchunk-uidstringThe unique identifier of the chunk.
Similaritysimilarity-scorenumberThe similarity score of the chunk.
Source File Namesource-file-namestringThe name of the source file.
Text Contenttext-contentstringThe text content of the chunk.

#Ask

Reply the questions based on the files in the catalog.

InputIDTypeDescription
Task ID (required)taskstringTASK_ASK
Catalog ID (required)catalog-idstringCatalog ID that you input to search files in the Catalog.
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
Question (required)questionstringThe question to reply.
Top Ktop-kintegerThe number of top answers to return. The range is from 1~20, and default is 5.
OutputIDTypeDescription
AnsweranswerstringAnswers data from smart search.
Chunks (optional)chunksarray[object]Chunks data to answer question.
Output Objects in Ask

Chunks

FieldField IDTypeNote
Chunk UIDchunk-uidstringThe unique identifier of the chunk.
Similaritysimilarity-scorenumberThe similarity score of the chunk.
Source File Namesource-file-namestringThe name of the source file.
Text Contenttext-contentstringThe text content of the chunk.

#Sync Files

This task synchronizes files from third-party storage to Instill Catalog. New files are uploaded, and updated files are overwritten based on third-party metadata. Files added through other channels, like the Artifact API or additional storage services, will not be removed. Currently, only Google Drive is supported as a third-party storage service.

InputIDTypeDescription
Task ID (required)taskstringTASK_SYNC_FILES
Namespace (required)namespacestringFill in your namespace, you can get namespace through the tab of switching namespace.
Catalog ID (required)catalog-idstringCatalog ID that you input to synchronize files from third-party data storage to catalog.
Third Party Files (required)third-party-filesarray[object]File contents and metadata from third-part data storage.
Input Objects in Sync Files

Third Party Files

File contents and metadata from third-part data storage.

FieldField IDTypeNote
ContentcontentstringBase64 encoded content of the binary file without the data:[MIME_TYPE];base64, prefix.
Created timecreated-timestringTime when the file was created. Format: YYYY-MM-DDTHH:MM:SSZ.
IDidstringUnique ID of the file from third-party data storage.
MD5 checksummd5-checksumstringMD5 checksum of the file. This reflects every change made to the file on the server, even those not visible to the user.
MIME typemime-typestringMIME type of the file.
Modified timemodified-timestringTime when the file was last modified. Format: YYYY-MM-DDTHH:MM:SSZ. It will be used to check if the file has been updated.
NamenamestringName of the file from third-party data storage.
SizesizeintegerSize of the file in bytes.
VersionversionintegerVersion of the file.
Web Content Linkweb-content-linkstringLink for downloading the content of the file in a browser.
Web View Linkweb-view-linkstringLink for opening the file in a relevant third-party data storage editor or viewer in a browser. It will be used to check the source of the file.
OutputIDTypeDescription
Uploaded Files (optional)uploaded-filesarray[object]Files metadata in catalog. The metadata here is from Instill Artifact rather than third-party storage.
Updated Files (optional)updated_filesarray[object]Files that were updated. The metadata here is from Instill Artifact rather than third-party storage.
Failure Files (optional)failure-filesarray[object]Files that failed to upload or overwrite. The metadata here is from third-party storage.
Error Messages (optional)error-messagesarray[string]Error messages for files that failed to upload or overwrite.
Status (optional)statusbooleanThe status of the triggering processing files, if succeeded, return true.
Output Objects in Sync Files

Uploaded Files

FieldField IDTypeNote
Catalog IDcatalog-idstringID of the catalog that you upload files.
Create Timecreate-timestringCreation time of the file in ISO 8601 format.
File Namefile-namestringName of the file.
Typefile-typestringType of the file.
File UIDfile-uidstringUnique identifier of the file.
SizesizenumberSize of the file in bytes.
Update Timeupdate-timestringUpdate time of the file in ISO 8601 format.

Updated Files

FieldField IDTypeNote
Catalog IDcatalog-idstringID of the catalog that you upload files.
Create Timecreate-timestringCreation time of the file in ISO 8601 format.
File Namefile-namestringName of the file.
Typefile-typestringType of the file.
File UIDfile-uidstringUnique identifier of the file.
SizesizenumberSize of the file in bytes.
Update Timeupdate-timestringUpdate time of the file in ISO 8601 format.

Failure Files

FieldField IDTypeNote
ContentcontentstringBase64 encoded content of the binary file without the data:[MIME_TYPE];base64, prefix.
Created timecreated-timestringTime when the file was created. Format: YYYY-MM-DDTHH:MM:SSZ.
IDidstringUnique ID of the file from third-party data storage.
MD5 checksummd5-checksumstringMD5 checksum of the file. This reflects every change made to the file on the server, even those not visible to the user.
MIME typemime-typestringMIME type of the file.
Modified timemodified-timestringTime when the file was last modified. Format: YYYY-MM-DDTHH:MM:SSZ. It will be used to check if the file has been updated.
NamenamestringName of the file from third-party data storage.
SizesizeintegerSize of the file in bytes.
VersionversionintegerVersion of the file.
Web Content Linkweb-content-linkstringLink for downloading the content of the file in a browser.
Web View Linkweb-view-linkstringLink for opening the file in a relevant third-party data storage editor or viewer in a browser. It will be used to check the source of the file.

#Example Recipes

Recipe for the Ask your Catalog pipeline.


version: v1beta
component:
artifact-0:
type: instill-artifact
task: TASK_ASK
input:
catalog-id: ${variable.catalog_name}
namespace: ${variable.namespace}
question: ${variable.question}
top-k: 5
variable:
catalog_name:
title: catalog-name
description: The name of your catalog i.e. "instill-ai"
format: string
namespace:
title: namespace
description: The namespace of your catalog i.e. "instill-ai"
format: string
question:
title: question
description: The question to ask your catalog i.e. "What is Instill AI doing?", "What is Artifact?"
format: string
output:
answer:
title: answer
value: ${artifact-0.output.answer}

Sync files from Google Drive to Instill Catalog.


# VDP Version
version: v1beta
variable:
namespace:
title: Namespace
format: string
catalog:
title: Catalog
format: string
folder-link:
title: Folder Link
format: string
component:
read-folder:
type: google-drive
input:
shared-link: ${variable.folder-link}
read-content: true
setup:
refresh-token: ${secret.refresh-token-gd}
task: TASK_READ_FOLDER
sync:
type: instill-artifact
input:
namespace: ${variable.namespace}
catalog-id: ${variable.catalog}
third-party-files: ${read-folder.output.files}
task: TASK_SYNC_FILES
output:
sync-result:
title: Sync Result
value: ${sync.output}