Creating Knowledge Bases

Knowledge bases store and organize your documents for AI-powered retrieval. Knowledge base, hallucination-free chat, and file management are the three pillars of Discover. Each knowledge base serves as a knowledge source, parsing files uploaded from your local machine into real ‘knowledge’ for future AI chats.

Creating a Knowledge Base

With multiple knowledge bases, you can build more flexible, diversified question answering. To create your first knowledge base:

Click on Knowledge Base in the left sidebar.
Click Create Knowledge Base.
Enter a name for your knowledge base (e.g., “Data Model”, “API Definition”).
Click OK to confirm.

Configuring Your Knowledge Base

A proper configuration of your knowledge base is crucial for future AI chats. Choosing the wrong embedding model or chunk method can cause unexpected semantic loss or mismatched answers.

On the Configuration page:

Modify the KB permission to allow team members to modify the KB.
Select the embedding model (e.g., gemini-embedding-01).
If required, modify the Chunk Method and recommended chunk size.
Add a proper delimiter (e.g., ##) according to your document for better chunk creation.
Configure Page Rank settings.
Set Auto-Keyword count (e.g., 5). This configuration allows to automatically extract N keywords for each chunk to increase their ranking for queries containing those keywords.

Selecting a Chunk Method

Discover offers multiple chunking templates to facilitate chunking files of different layouts and ensure semantic integrity. The following table describes each supported chunk template:

Template	Description	File Format
General	Files are consecutively chunked based on a preset chunk token number.	DOCX, XLSX, XLS, PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML
Q&A	Parses question-and-answer pairs.	XLSX, XLS, CSV/TXT
Resume	Optimized for resume/CV documents.	DOCX, PDF, TXT
Manual	Manual chunking with user control.	PDF
Table	Optimized for tabular data.	XLSX, XLS, CSV/TXT
Paper	Optimized for academic papers.	PDF
Book	Optimized for book-length documents.	DOCX, PDF, TXT
Laws	Optimized for legal documents.	DOCX, PDF, TXT
Presentation	Optimized for slide presentations.	PDF, PPTX
Picture	Processes image files.	JPEG, JPG, PNG, TIF, GIF
One	Each document is chunked in its entirety (as one).	DOCX, XLSX, XLS, PDF, TXT
Tag	The knowledge base functions as a tag set for others.	XLSX, CSV/TXT

You can also change a file’s chunk method on the Datasets page after uploading.

The selected chunking method description appears on the right side of the Configuration Page.

knowledge base chunking

Selecting an Embedding Model

An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, you must delete all existing chunks. This ensures all files in a specific knowledge base are compared in the same embedding space.

Uploading Documents

Click on Dataset at the top to upload files.
Select Add File, and Local File to upload a file from your device.
Select supported file types (PDF, DOCX, TXT, CSV, images, etc.) from your device.
Once uploaded, you can click the parse button to process the document and divide it into chunkings. See Parsing Files for more details.
Monitor the parsing progress — you can intervene if needed by adjusting settings like chunking methods.

Parsing Files

File parsing is a crucial step in knowledge base configuration. It involves chunking files based on file layout and building embedding and full-text (keyword) indexes on those chunks.

Click the play button next to UNSTART to start file parsing.
Click the red-cross icon and then refresh if your file parsing stalls for a long time.
Discover allows you to use a different chunk method for a particular file, offering flexibility beyond the default method.
You can enable or disable individual files for finer control over knowledge base-based AI chats.

Parsing Files

Intervening with Parsing Results

Discover features visibility and explainability, allowing you to view and intervene in chunking results:

Click on a file that has completed parsing to view the chunking results (you are taken to the Chunk page).
Hover over each snapshot for a quick view of each chunk.
Double-click the chunked texts to add keywords or make manual changes where necessary.

Running Retrieval Tests

Discover uses multiple recall of both full-text search and vector search in its chats. Prior to setting up an AI chat, consider adjusting the following parameters:

Similarity threshold: Chunks with similarities below the threshold will be filtered. Defaults to 0.2.
Vector similarity weight: The percentage by which vector similarity contributes to the overall score. Defaults to 0.3.

In Retrieval testing, enter a test question in Test text to double-check if your configurations work. Discover responds with truthful citations.

Searching for a Knowledge Base

The search feature supports knowledge base search by name.

Searching KB

Deleting a Knowledge Base

You are allowed to delete a knowledge base. Hover your mouse over the three dots of the intended knowledge base card and the Delete option appears.

Deleting KB