Skip to content

Creating Knowledge Bases

Knowledge bases store and organize your documents for AI-powered retrieval. Knowledge base, hallucination-free chat, and file management are the three pillars of Discover. Each knowledge base serves as a knowledge source, parsing files uploaded from your local machine into real ‘knowledge’ for future AI chats.

With multiple knowledge bases, you can build more flexible, diversified question answering. To create your first knowledge base:

  1. Click on Knowledge Base in the left sidebar.

  2. Click Create Knowledge Base.

  3. Enter a name for your knowledge base (e.g., “Data Model”, “API Definition”).

  4. Click OK to confirm.

    Create a knowledge base

A proper configuration of your knowledge base is crucial for future AI chats. Choosing the wrong embedding model or chunk method can cause unexpected semantic loss or mismatched answers.

On the Configuration page:

  1. Modify the KB permission to allow team members to modify the KB.

  2. Select the embedding model (e.g., gemini-embedding-01).

  3. If required, modify the Chunk Method and recommended chunk size.

  4. Add a proper delimiter (e.g., ##) according to your document for better chunk creation.

    knowledge base configuration-1

  5. Configure Page Rank settings.

  6. Set Auto-Keyword count (e.g., 5). This configuration allows to automatically extract N keywords for each chunk to increase their ranking for queries containing those keywords.

    knowledge base configuration-2

Discover offers multiple chunking templates to facilitate chunking files of different layouts and ensure semantic integrity. The following table describes each supported chunk template:

TemplateDescriptionFile Format
GeneralFiles are consecutively chunked based on a preset chunk token number.DOCX, XLSX, XLS, PPT, PDF, TXT, JPEG, JPG, PNG, TIF, GIF, CSV, JSON, EML, HTML
Q&AParses question-and-answer pairs.XLSX, XLS, CSV/TXT
ResumeOptimized for resume/CV documents.DOCX, PDF, TXT
ManualManual chunking with user control.PDF
TableOptimized for tabular data.XLSX, XLS, CSV/TXT
PaperOptimized for academic papers.PDF
BookOptimized for book-length documents.DOCX, PDF, TXT
LawsOptimized for legal documents.DOCX, PDF, TXT
PresentationOptimized for slide presentations.PDF, PPTX
PictureProcesses image files.JPEG, JPG, PNG, TIF, GIF
OneEach document is chunked in its entirety (as one).DOCX, XLSX, XLS, PDF, TXT
TagThe knowledge base functions as a tag set for others.XLSX, CSV/TXT

You can also change a file’s chunk method on the Datasets page after uploading.

The selected chunking method description appears on the right side of the Configuration Page.

knowledge base chunking

An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, you must delete all existing chunks. This ensures all files in a specific knowledge base are compared in the same embedding space.

  1. Click on Dataset at the top to upload files.

  2. Select Add File, and Local File to upload a file from your device.

  3. Select supported file types (PDF, DOCX, TXT, CSV, images, etc.) from your device.

  4. Once uploaded, you can click the parse button to process the document and divide it into chunkings. See Parsing Files for more details.

  5. Monitor the parsing progress — you can intervene if needed by adjusting settings like chunking methods.

    Uploading a File

File parsing is a crucial step in knowledge base configuration. It involves chunking files based on file layout and building embedding and full-text (keyword) indexes on those chunks.

  • Click the play button next to UNSTART to start file parsing.
  • Click the red-cross icon and then refresh if your file parsing stalls for a long time.
  • Discover allows you to use a different chunk method for a particular file, offering flexibility beyond the default method.
  • You can enable or disable individual files for finer control over knowledge base-based AI chats.

Parsing Files

Discover features visibility and explainability, allowing you to view and intervene in chunking results:

  1. Click on a file that has completed parsing to view the chunking results (you are taken to the Chunk page).

  2. Hover over each snapshot for a quick view of each chunk.

  3. Double-click the chunked texts to add keywords or make manual changes where necessary.

Discover uses multiple recall of both full-text search and vector search in its chats. Prior to setting up an AI chat, consider adjusting the following parameters:

  • Similarity threshold: Chunks with similarities below the threshold will be filtered. Defaults to 0.2.
  • Vector similarity weight: The percentage by which vector similarity contributes to the overall score. Defaults to 0.3.

In Retrieval testing, enter a test question in Test text to double-check if your configurations work. Discover responds with truthful citations.

The search feature supports knowledge base search by name.

Searching KB

You are allowed to delete a knowledge base. Hover your mouse over the three dots of the intended knowledge base card and the Delete option appears.

Deleting KB