1οΈβ£ Overview & How It Works
What is DocChat?
DocChat is an intelligent document chat assistant powered by advanced AI technology. It allows you to have natural conversations about your documents, asking questions and receiving accurate, cited answers based on your document collection.
How Does It Work?
DocChat uses a technology called Retrieval-Augmented Generation (RAG):
- Document Indexing: Your documents are processed and stored in a searchable format with intelligent chunking and embedding
- Multi-Language Query Expansion: Your question is automatically expanded into multiple query variants across different languages (English, Dutch, French)
- Intelligent Search: The system searches your documents using advanced similarity matching to find the most relevant information
- AI Response Generation: A large language model (LLM) reads the relevant document sections and generates a comprehensive answer in your selected language
- Citation & References: All answers include references to specific document sections [Block X] so you can verify the information
π‘ Key Benefit: Unlike generic AI chatbots, DocChat answers are always grounded in your actual documents, making responses accurate, verifiable, and trustworthy.
Supported Document Types
- π PDF (.pdf)
- π Word Documents (.doc, .docx)
- π PowerPoint (.ppt, .pptx)
- π Excel (.xls, .xlsx)
- π Text Files (.txt, .md)
2οΈβ£ Getting Started
Step 1: Index Your Documents
Before asking questions, your documents need to be indexed:
- Click the "π Index Documents" button in the sidebar
- Wait for the indexing process to complete (progress bar will show status)
- Once complete, the document count will update at the top
β
Tip: Documents are automatically indexed on application startup if configured. You only need to re-index when new documents are added.
Step 2: Upload Additional Documents (Optional)
To add individual documents:
- Click "Choose File" under "Upload Document"
- Select your file
- Click "Upload"
- The document will be automatically indexed
Step 3: Select Your Settings
- LLM Model: Choose your preferred AI model (GPT-4o recommended for best results)
- Output Language: Select the language for AI responses
- Chat Mode: Choose between Basic RAG, Extensive, or Full Reading
Step 4: Ask Your Question
Type your question in the chat input and press Enter or click Send!
3οΈβ£ Operating Modes
DocChat offers three operating modes, each optimized for different use cases:
π― Basic RAG
Best for: Quick questions
Speed: Fast (10-30 seconds)
How it works: Searches for relevant chunks and generates an answer based on the top matches.
Use when: You need quick answers to specific questions
π Extensive
Best for: Detailed analysis
Speed: Medium (1-3 minutes)
How it works: Retrieves full documents, preprocesses them with a small LLM, then generates comprehensive answers.
Use when: You need thorough, detailed information from multiple documents
π Full Reading
Best for: Complete overview
Speed: Slow (5-10 minutes)
How it works: Reads ALL documents in your collection (or selected sources) and provides a comprehensive synthesis.
Use when: You need a complete understanding across your entire document collection
β οΈ Mode Selection Tip: Start with Basic RAG for most questions. Use Extensive when Basic doesn't provide enough detail. Reserve Full Reading for rare cases when you need a complete overview.
Comparing Modes
| Feature |
Basic RAG |
Extensive |
Full Reading |
| Speed |
β‘ Fast |
β‘β‘ Medium |
β‘β‘β‘ Slow |
| Documents Searched |
Top 5-15 chunks |
Top 10-20 docs |
ALL documents |
| Detail Level |
Focused |
Detailed |
Comprehensive |
| Best Use Case |
Quick facts |
Analysis |
Overview |
4οΈβ£ UI Features & Options
Settings Section
π€ LLM Model
Choose the AI model for generating responses:
- GPT-4o (OpenAI) (Recommended): Latest model with excellent reasoning - best balance of quality and speed
- GPT-4o Mini (OpenAI): Faster, more cost-effective model
- Claude Sonnet 4.5 (Anthropic): Superior analytical capabilities
- Gemini 2.0 Flash (Google): Enhanced reasoning with very large context window
- GPT-5 Models (OpenAI - Experimental): Next generation models available for testing (not production-ready)
β οΈ Note: GPT-5 models (GPT-5, GPT-5 Mini, GPT-5 Pro) are experimental and may occasionally return empty responses. Use GPT-4o for production work.
π Output Language
Select the language for AI responses:
- English, Nederlands, FranΓ§ais, Deutsch, EspaΓ±ol, Italiano, PortuguΓͺs
- The system will search documents in all configured languages but respond in your selected language
- Selection persists across sessions
β
Use Reranking
When enabled, search results are reranked using a cross-encoder model for improved relevance. Recommended: Keep enabled for better accuracy.
π Include Chat History
When enabled, the AI remembers previous messages in the conversation and can answer follow-up questions with context. Recommended: Keep enabled for natural conversations.
π Enable Web Search Augmentation
When enabled, if no relevant documents are found, the system will search the web for additional information. Use with caution as web results are not from your document collection.
π Full Document Mode
Skip preprocessing step and send full documents to the main LLM. Useful when you want complete document content without summarization.
π― Reference Relevance Filter
When enabled, filters out low-relevance document references based on the threshold score. Helps focus on the most relevant sources.
π Top K
Number of document chunks/documents to retrieve. Higher values retrieve more information but may increase processing time. Default: 5 for Basic, 10 for Extensive.
Source Selection
ποΈ Select Sources
Restrict your search to specific documents or folders:
- Click "ποΈ Select Sources"
- Browse the document tree
- Check/uncheck documents or folders
- Use "Select All" or "Clear All" for quick selection
- Click "Apply" to confirm
β
Pro Tip: Selecting specific sources significantly improves speed and relevance when you know which documents contain your answer.
Manual Keywords
π Manual Keywords
Add specific terms to enhance your search:
- Enter comma-separated keywords:
safety, protocol, procedure
- Keywords are used to filter and rank search results
- Particularly useful for technical terms or specific concepts
- Works across all operating modes
Example Usage:
Query: "What are the requirements?"
Keywords: calibration, equipment, validation
Result: System focuses on calibration-related requirements
Custom Instructions
π Custom Instructions
Provide specific guidance for the AI:
- Click "π Custom Instructions" button
- Enter instructions in markdown format
- Examples: "Focus on safety aspects", "Provide step-by-step procedures", "Compare different approaches"
- Instructions apply to the current conversation
5οΈβ£ Best Practices for Formulating Queries
β
Good Query Practices
1. Be Specific and Clear
β Vague:
"Tell me about safety"
β
Better:
"What are the safety procedures for handling chemical waste in the laboratory?"
2. Use Natural Language
Write questions as you would ask a colleague:
- β
"How do I calibrate the pH meter?"
- β
"What are the storage requirements for reagents?"
- β
"Who is responsible for equipment maintenance?"
3. Provide Context When Needed
Example:
"In the context of biosafety level 2 procedures, what PPE is required for handling biological samples?"
4. Use Follow-Up Questions
With chat history enabled, you can ask follow-up questions:
First: "What are the waste disposal procedures?"
Then: "How often should waste containers be emptied?"
Then: "Who is responsible for this?"
5. Leverage Language Flexibility
Ask questions in any language - the system will search all documents:
- "Wat zijn de kalibratieprocedures?" (Dutch)
- "Quelles sont les procΓ©dures de sΓ©curitΓ©?" (French)
- "What are the safety procedures?" (English)
π― Query Optimization Tips
Combine with Source Selection
If you know which documents contain the answer, select them first:
- Select relevant source documents/folders
- Ask your question
- Get faster, more focused results
Use Keywords for Technical Topics
For specialized domains, add technical keywords:
Query: "What are the requirements?"
Keywords: GLP, validation, ISO
Result: Focuses on GLP validation requirements
Choose the Right Mode
| Question Type |
Recommended Mode |
| Simple fact: "What is the expiry date?" |
Basic RAG |
| Procedure: "How do I perform calibration?" |
Extensive |
| Overview: "Summarize all safety policies" |
Full Reading |
| Comparison: "Compare methods A and B" |
Extensive |
β οΈ What to Avoid
- β Overly broad questions: "Tell me everything"
- β Multiple unrelated questions in one query
- β Questions about information not in your documents
- β Expecting real-time or external information
- β Assuming the AI remembers previous conversations (unless chat history is enabled)
6οΈβ£ Important Caveats & User Responsibility
β οΈ CRITICAL DISCLAIMER: DocChat is an AI-powered assistant tool. While it provides intelligent and helpful responses, users bear full responsibility for verifying and validating all information before making decisions or taking actions based on AI responses.
π€ AI Limitations
1. AI Can Make Mistakes
Large Language Models (LLMs) can occasionally:
- Misinterpret document content
- Hallucinate information not present in documents
- Miss context that affects interpretation
- Combine information from different sources incorrectly
Your Responsibility: Always verify AI responses by checking the cited document blocks [Block X] and reviewing the original source documents.
2. Document Interpretation Limits
- AI may struggle with complex tables, diagrams, or visual content
- Scanned documents (images in PDFs) may not be fully readable without OCR
- Context from headers, footers, or formatting may be lost
- Mathematical formulas or specialized notation may be misinterpreted
3. Language Translation Considerations
- Translations between languages may lose nuance or technical precision
- Technical terms may not translate directly
- Always verify critical information in the original document language
π User Responsibilities
β
Required User Actions
- Verify Citations: Check the [Block X] references in responses
- Review Source Documents: Read the original documents for critical decisions
- Cross-Check Information: Validate important information from multiple sources
- Apply Domain Expertise: Use your professional judgment to evaluate AI responses
- Report Issues: If you notice inaccuracies, report them to improve the system
β οΈ Critical Use Cases
For high-stakes decisions involving:
- π₯ Safety protocols and procedures
- βοΈ Regulatory compliance
- π¬ Scientific or medical procedures
- π Legal or contractual obligations
- π° Financial decisions
ALWAYS review original source documents and consult with qualified professionals. Do not rely solely on AI-generated responses for critical decisions.
π Data Privacy & Security
Document Handling
- Documents are processed and stored in the vector database
- Queries and responses are sent to external LLM providers (OpenAI, Anthropic, Google)
- Ensure you have proper authorization to use documents with the system
- Do not upload confidential or sensitive information without proper security review
Session Data
- Chat history is stored in session storage
- Sessions can be cleared using the "Clear History" button
- Close your browser or clear cookies to end sessions
βοΈ System Limitations
Performance Considerations
- Response Time: Varies by mode (10 seconds to 10 minutes)
- Token Limits: Very long documents may be truncated or summarized
- Concurrent Users: Heavy usage may slow response times
- Model Availability: Dependent on external API availability
Document Processing Limits
- Maximum file size: 50 MB per document
- Some complex formatting may not be preserved
- Scanned documents require OCR (not always perfect)
- Very large document collections may take time to index
β
Best Practices for Responsible Use
The "Trust but Verify" Approach:
- Use DocChat to quickly find relevant information
- Check the cited block references [Block X]
- Review the original source documents
- Apply your professional expertise and judgment
- Validate critical information through proper channels
When to Use DocChat
- β
Quick information lookup
- β
Understanding document content
- β
Finding relevant sections in large document sets
- β
Getting overviews and summaries
- β
Identifying which documents contain specific information
When NOT to Rely Solely on DocChat
- β Critical safety decisions
- β Regulatory compliance verification
- β Legal interpretations
- β Medical or scientific procedures without verification
- β Financial or contractual decisions
Remember: DocChat is a powerful assistant tool, not a replacement for human expertise, judgment, and verification. Use it to enhance your work, not to bypass necessary due diligence.
π Need Help?
For additional support or to report issues:
- Check the original source documents when in doubt
- Use the "Clear History" button to reset conversations
- Re-index documents if you notice missing or outdated information
- Contact your system administrator for technical issues
Version Information: DocChat uses state-of-the-art RAG technology with multi-language support, powered by OpenAI models (GPT-4o, GPT-4o Mini, GPT-5 experimental), Anthropic Claude Sonnet 4.5, and Google Gemini 2.0 Flash.