DocChat User Guide

1️⃣ Overview & How It Works

What is DocChat?

DocChat is an intelligent document chat assistant powered by advanced AI technology. It allows you to have natural conversations about your documents, asking questions and receiving accurate, cited answers based on your document collection.

How Does It Work?

DocChat uses a technology called Retrieval-Augmented Generation (RAG):

Document Indexing: Your documents are processed and stored in a searchable format with intelligent chunking and embedding
Multi-Language Query Expansion: Your question is automatically expanded into multiple query variants across different languages (English, Dutch, French)
Intelligent Search: The system searches your documents using advanced similarity matching to find the most relevant information
AI Response Generation: A large language model (LLM) reads the relevant document sections and generates a comprehensive answer in your selected language
Citation & References: All answers include references to specific document sections [Block X] so you can verify the information

💡 Key Benefit: Unlike generic AI chatbots, DocChat answers are always grounded in your actual documents, making responses accurate, verifiable, and trustworthy.

Supported Document Types

📄 PDF (.pdf)
📝 Word Documents (.doc, .docx)
📊 PowerPoint (.ppt, .pptx)
📈 Excel (.xls, .xlsx)
📋 Text Files (.txt, .md)

2️⃣ Getting Started

Step 1: Index Your Documents

Before asking questions, your documents need to be indexed:

Click the "🔄 Index Documents" button in the sidebar
Wait for the indexing process to complete (progress bar will show status)
Once complete, the document count will update at the top

✅ Tip: Documents are automatically indexed on application startup if configured. You only need to re-index when new documents are added.

Step 2: Upload Additional Documents (Optional)

To add individual documents:

Click "Choose File" under "Upload Document"
Select your file
Click "Upload"
The document will be automatically indexed

Step 3: Select Your Settings

LLM Model: Choose your preferred AI model (GPT-4o recommended for best results)
Output Language: Select the language for AI responses
Chat Mode: Choose between Basic RAG, Extensive, or Full Reading

Step 4: Ask Your Question

Type your question in the chat input and press Enter or click Send!

3️⃣ Operating Modes

DocChat offers three operating modes, each optimized for different use cases:

🎯 Basic RAG

Best for: Quick questions

Speed: Fast (10-30 seconds)

How it works: Searches for relevant chunks and generates an answer based on the top matches.

Use when: You need quick answers to specific questions

📚 Extensive

Best for: Detailed analysis

Speed: Medium (1-3 minutes)

How it works: Retrieves full documents, preprocesses them with a small LLM, then generates comprehensive answers.

Use when: You need thorough, detailed information from multiple documents

📖 Full Reading

Best for: Complete overview

Speed: Slow (5-10 minutes)

How it works: Reads ALL documents in your collection (or selected sources) and provides a comprehensive synthesis.

Use when: You need a complete understanding across your entire document collection

⚠️ Mode Selection Tip: Start with Basic RAG for most questions. Use Extensive when Basic doesn't provide enough detail. Reserve Full Reading for rare cases when you need a complete overview.

Comparing Modes

Feature	Basic RAG	Extensive	Full Reading
Speed	⚡ Fast	⚡⚡ Medium	⚡⚡⚡ Slow
Documents Searched	Top 5-15 chunks	Top 10-20 docs	ALL documents
Detail Level	Focused	Detailed	Comprehensive
Best Use Case	Quick facts	Analysis	Overview

4️⃣ UI Features & Options

Settings Section

🤖 LLM Model

Choose the AI model for generating responses:

GPT-4o (OpenAI) (Recommended): Latest model with excellent reasoning - best balance of quality and speed
GPT-4o Mini (OpenAI): Faster, more cost-effective model
Claude Sonnet 4.5 (Anthropic): Superior analytical capabilities
Gemini 2.0 Flash (Google): Enhanced reasoning with very large context window
GPT-5 Models (OpenAI - Experimental): Next generation models available for testing (not production-ready)

⚠️ Note: GPT-5 models (GPT-5, GPT-5 Mini, GPT-5 Pro) are experimental and may occasionally return empty responses. Use GPT-4o for production work.

🌍 Output Language

Select the language for AI responses:

English, Nederlands, Français, Deutsch, Español, Italiano, Português
The system will search documents in all configured languages but respond in your selected language
Selection persists across sessions

✅ Use Reranking

When enabled, search results are reranked using a cross-encoder model for improved relevance. Recommended: Keep enabled for better accuracy.

📝 Include Chat History

When enabled, the AI remembers previous messages in the conversation and can answer follow-up questions with context. Recommended: Keep enabled for natural conversations.

🔍 Enable Web Search Augmentation

When enabled, if no relevant documents are found, the system will search the web for additional information. Use with caution as web results are not from your document collection.

📄 Full Document Mode

Skip preprocessing step and send full documents to the main LLM. Useful when you want complete document content without summarization.

🎯 Reference Relevance Filter

When enabled, filters out low-relevance document references based on the threshold score. Helps focus on the most relevant sources.

📊 Top K

Number of document chunks/documents to retrieve. Higher values retrieve more information but may increase processing time. Default: 5 for Basic, 10 for Extensive.

Source Selection

🗂️ Select Sources

Restrict your search to specific documents or folders:

Click "🗂️ Select Sources"
Browse the document tree
Check/uncheck documents or folders
Use "Select All" or "Clear All" for quick selection
Click "Apply" to confirm

✅ Pro Tip: Selecting specific sources significantly improves speed and relevance when you know which documents contain your answer.

Manual Keywords

🔎 Manual Keywords

Add specific terms to enhance your search:

Enter comma-separated keywords: safety, protocol, procedure
Keywords are used to filter and rank search results
Particularly useful for technical terms or specific concepts
Works across all operating modes

Example Usage:

Query: "What are the requirements?"
Keywords: calibration, equipment, validation
Result: System focuses on calibration-related requirements

Custom Instructions

📝 Custom Instructions

Provide specific guidance for the AI:

Click "📝 Custom Instructions" button
Enter instructions in markdown format
Examples: "Focus on safety aspects", "Provide step-by-step procedures", "Compare different approaches"
Instructions apply to the current conversation

5️⃣ Best Practices for Formulating Queries

✅ Good Query Practices

1. Be Specific and Clear

❌ Vague:

"Tell me about safety"

✅ Better:

"What are the safety procedures for handling chemical waste in the laboratory?"

2. Use Natural Language

Write questions as you would ask a colleague:

✅ "How do I calibrate the pH meter?"
✅ "What are the storage requirements for reagents?"
✅ "Who is responsible for equipment maintenance?"

3. Provide Context When Needed

Example:

"In the context of biosafety level 2 procedures, what PPE is required for handling biological samples?"

4. Use Follow-Up Questions

With chat history enabled, you can ask follow-up questions:

First: "What are the waste disposal procedures?"
Then: "How often should waste containers be emptied?"
Then: "Who is responsible for this?"

5. Leverage Language Flexibility

Ask questions in any language - the system will search all documents:

"Wat zijn de kalibratieprocedures?" (Dutch)
"Quelles sont les procédures de sécurité?" (French)
"What are the safety procedures?" (English)

🎯 Query Optimization Tips

Combine with Source Selection

If you know which documents contain the answer, select them first:

Select relevant source documents/folders
Ask your question
Get faster, more focused results

Use Keywords for Technical Topics

For specialized domains, add technical keywords:

Query: "What are the requirements?"
Keywords: GLP, validation, ISO
Result: Focuses on GLP validation requirements

Choose the Right Mode

Question Type	Recommended Mode
Simple fact: "What is the expiry date?"	Basic RAG
Procedure: "How do I perform calibration?"	Extensive
Overview: "Summarize all safety policies"	Full Reading
Comparison: "Compare methods A and B"	Extensive

⚠️ What to Avoid

❌ Overly broad questions: "Tell me everything"
❌ Multiple unrelated questions in one query
❌ Questions about information not in your documents
❌ Expecting real-time or external information
❌ Assuming the AI remembers previous conversations (unless chat history is enabled)

6️⃣ Important Caveats & User Responsibility

⚠️ CRITICAL DISCLAIMER: DocChat is an AI-powered assistant tool. While it provides intelligent and helpful responses, users bear full responsibility for verifying and validating all information before making decisions or taking actions based on AI responses.

🤖 AI Limitations

1. AI Can Make Mistakes

Large Language Models (LLMs) can occasionally:

Misinterpret document content
Hallucinate information not present in documents
Miss context that affects interpretation
Combine information from different sources incorrectly

Your Responsibility: Always verify AI responses by checking the cited document blocks [Block X] and reviewing the original source documents.

2. Document Interpretation Limits

AI may struggle with complex tables, diagrams, or visual content
Scanned documents (images in PDFs) may not be fully readable without OCR
Context from headers, footers, or formatting may be lost
Mathematical formulas or specialized notation may be misinterpreted

3. Language Translation Considerations

Translations between languages may lose nuance or technical precision
Technical terms may not translate directly
Always verify critical information in the original document language

📋 User Responsibilities

✅ Required User Actions

Verify Citations: Check the [Block X] references in responses
Review Source Documents: Read the original documents for critical decisions
Cross-Check Information: Validate important information from multiple sources
Apply Domain Expertise: Use your professional judgment to evaluate AI responses
Report Issues: If you notice inaccuracies, report them to improve the system

⚠️ Critical Use Cases

For high-stakes decisions involving:

🏥 Safety protocols and procedures
⚖️ Regulatory compliance
🔬 Scientific or medical procedures
📜 Legal or contractual obligations
💰 Financial decisions

ALWAYS review original source documents and consult with qualified professionals. Do not rely solely on AI-generated responses for critical decisions.

🔒 Data Privacy & Security

Document Handling

Documents are processed and stored in the vector database
Queries and responses are sent to external LLM providers (OpenAI, Anthropic, Google)
Ensure you have proper authorization to use documents with the system
Do not upload confidential or sensitive information without proper security review

Session Data

Chat history is stored in session storage
Sessions can be cleared using the "Clear History" button
Close your browser or clear cookies to end sessions

⚙️ System Limitations

Performance Considerations

Response Time: Varies by mode (10 seconds to 10 minutes)
Token Limits: Very long documents may be truncated or summarized
Concurrent Users: Heavy usage may slow response times
Model Availability: Dependent on external API availability

Document Processing Limits

Maximum file size: 50 MB per document
Some complex formatting may not be preserved
Scanned documents require OCR (not always perfect)
Very large document collections may take time to index

✅ Best Practices for Responsible Use

The "Trust but Verify" Approach:

Use DocChat to quickly find relevant information
Check the cited block references [Block X]
Review the original source documents
Apply your professional expertise and judgment
Validate critical information through proper channels

When to Use DocChat

✅ Quick information lookup
✅ Understanding document content
✅ Finding relevant sections in large document sets
✅ Getting overviews and summaries
✅ Identifying which documents contain specific information

When NOT to Rely Solely on DocChat

❌ Critical safety decisions
❌ Regulatory compliance verification
❌ Legal interpretations
❌ Medical or scientific procedures without verification
❌ Financial or contractual decisions

Remember: DocChat is a powerful assistant tool, not a replacement for human expertise, judgment, and verification. Use it to enhance your work, not to bypass necessary due diligence.

📞 Need Help?

For additional support or to report issues:

Check the original source documents when in doubt
Use the "Clear History" button to reset conversations
Re-index documents if you notice missing or outdated information
Contact your system administrator for technical issues

Version Information: DocChat uses state-of-the-art RAG technology with multi-language support, powered by OpenAI models (GPT-4o, GPT-4o Mini, GPT-5 experimental), Anthropic Claude Sonnet 4.5, and Google Gemini 2.0 Flash.

📚 DocChat User Guide

📑 Table of Contents