PDF Knowledge Search Without RAG — Validation of Agentforce × Multimodal × Custom Objects

March 23, 2026

Introduction

“Can PDF knowledge base search and Q&A be implemented without RAG (Retrieval Augmented Generation)?”

Prompted by this question from a client, we conducted a validation combining Salesforce Agentforce and Multimodal AI. In this blog, we share the results of that validation and the implementation approach.

1. Background and Challenges

Conventional Approach: Intelligent Context / Data Library

Salesforce Intelligent Context and data library features work by indexing documents and retrieving relevant content through semantic search. However, there are challenges such as the following.

•

Accuracy in interpreting PDFs that contain complex tables, charts, and graphs

•

The cost of building and operating vector search infrastructure

•

The complexity of designing and tuning RAG pipelines

Validation Question: “Can we achieve equivalent or better quality without RAG?”

To answer this question, we validated an approach that combines PDF analysis using Multimodal AI (Gemini 2.5 Pro) with custom objects and tag-based search.

2. Validation Approach

2.1 Why Multimodal?

PDFs that contain complex tables, charts, and graphs can be difficult to preserve accurately in terms of structure and numeric relationships when using only conventional text extraction or OCR.

By leveraging the multimodal capability of Gemini 2.5 Pro, we enabled the model to “look at” and understand PDFs, then structure them as follows.

•

Tables → Converted into HTML <table>

•

Charts and graphs → Data points extracted and stored as HTML tables

•

Headings and paragraphs → Structured into hierarchical semantic HTML (<h1> to <h6>, <p>, <section>)

As a result, we confirmed that it is possible to accumulate knowledge in a way that preserves PDF structure and numeric information without relying on Intelligent Context or a data library.

2.2 Overall Architecture

[PDF File]
    ↓ Processed asynchronously by batch (PDFExtractBatch / PDFExtractScheduler)
    ↓ Extracted and structured with Multimodal (Gemini 2.5 Pro)
[PDFExtract Prompt Template]
    ↓ Chunk split and automatic tag generation
[PDF_Knowledge__c] Custom Object
    - Chunk1 to 10 (HTML format)
    - Tag (for search)
    - FileName, FilePath (source information)
    ↓
[PDFTag__c] Master of all tags
    ↓
[User Question] → Agentforce Topic Instruction
    ↓ Calls the appropriate action
[Evaluate_PDF_Tags] LLM evaluates the relevance between the question and tags
    ↓ Identifies relevant tags
[PDF_Knowledge__c Search] LIKE search by tags
    ↓ Retrieves matching chunks
[Search_PDF_Knowledge] Generates response with sources
    ↓
[HTML Response]

3. Implementation Highlights

3.1 PDF Extraction (PDFExtract)

Input:

PDF files in Salesforce Files (ContentDocument)

Process:

Uses multimodal processing to structure tables, charts, and graphs into HTML, then splits them into up to 10 chunks

Output:

JSON (chunks, tags)

Stored In:

PDF_Knowledge__c, PDFTag__c

We confirmed that even for PDFs containing complex diagrams and tables, extraction can preserve both numerical values and structural relationships.

3.2 Tag-Based Search (As an Alternative to RAG)

RAG typically retrieves semantically similar documents through vector search. In this validation, we instead adopted the following.

•

Automatic tag generation: The LLM generates tags during PDF extraction

•

Relevance evaluation between questions and tags: The Evaluate_PDF_Tags Prompt Template uses an LLM to evaluate the relationship between the user’s question and all tags

•

Filtering by tags: Relevant tags are used to perform a LIKE search on PDF_Knowledge__c and retrieve matching chunks

By incorporating an expert perspective for each domain into the prompts, we improve the stability of tag evaluation.

3.3 Integration with Agentforce

Topic Instruction:

Instructs the system to call the appropriate action depending on the type of question

GenAiFunction:

Defines the Search PDF Knowledge action

Invocable Apex:

SearchPDFKnowledge orchestrates tag evaluation → search → answer generation

4. Validation Scenario Examples

In this validation, we tested operation using a scenario based on manufacturing product manuals and safety standards.

4.1 Example of Registered PDF

4.2 Example of Extracted Tags (All_Tags_Str)

"Product Specifications","Handling Instructions","Safety Standards","Protective Equipment","Procedure","Quality Control",
"Inspection Standards","Tolerance Values","Checklist","Record Format","Illustration","Flowchart"

4.3 Mapping of Example Questions and Related Tags

4.4 Example Flow (Question Example: “Please explain the safety procedures”)

1. The user asks the Agent a question
→ “Please explain the safety procedures”

2. Topic Instruction interprets the question
→ Calls the Search PDF Knowledge action

3. Evaluate_PDF_Tags evaluates the tags
→ Input: Search_Term="Please explain the safety procedures", All_Tags_Str="Product Specifications","Safety Standards",...
→ Output: relatedTags=["Safety Standards","Procedure","Checklist"]

4. Search PDF_Knowledge__c by tags
→ TagSearch__c LIKE '%Safety Standards%' OR LIKE '%Procedure%' OR LIKE '%Checklist%'

5. Retrieve matching chunks
→ Matching section from Safety Standards Guideline.pdf

6. Search_PDF_Knowledge generates the answer
→ “The safety procedures are as follows: 1) Confirm wearing protective equipment 2) ...”
→ Source: Safety Standards Guideline.pdf

5. Prompt Template Examples

Below are examples of each prompt’s role and sample descriptions tailored to the manufacturing scenario.

5.1 PDFExtract (PDF Extraction and Structuring)

Role: Uses Multimodal AI to analyze the PDF and perform HTML structuring, chunk splitting, and tag generation.

Main Instructions (Excerpt):

- Tables → Convert to HTML <table>
- Charts and graphs → Store data points as HTML tables
- Wrap sections with <section> or <div class='section'>
- Generate 3 to 15 tags from the document content (e.g. "Product Specifications","Safety Standards","Tolerance Values")
- Output in JSON format (chunks, tags), and escape " as \" within strings

5.2 Evaluate_PDF_Tags (Tag Relevance Evaluation)

Role: Evaluates the relationship between the user’s question and all tags, and identifies relevant tags.

Input: `Search_Term`, `All_Tags_Str`

Main Instructions (Excerpt):

ROLE: As a document specialist, pick up all tags that connect to the knowledge needed to answer the question.

EVALUATION:
- Direct match: The question’s keyword is included in a tag
- Partial match: A tag is a compound term that contains the concept
- Semantic relevance: A tag belongs to the same domain
- Means/Method: If the question asks “how,” include tags related to procedures and checklists

DOMAIN HINTS (Manufacturing / Quality Control example):
- Specifications / Tolerance values → Product Specifications, Tolerance Values, Handling Instructions, Inspection Standards
- Procedure / Method → Safety Standards, Procedure, Checklist, Record Format
- Protection / Safety → Protective Equipment, Safety Standards

Output Example:

{"relatedTags": ["Safety Standards", "Procedure", "Checklist"]}

5.3 Search_PDF_Knowledge (Answer Generation)

Role: Refers to the knowledge obtained by the search and generates an answer with source attribution.

Input: `Search_Term`, `Search_Results` (JSON of knowledge + sources)

Main Instructions (Excerpt):

- Refer only to the knowledge in the search results; do not fabricate information
- Always clearly indicate the source (e.g. Source: Internal Manual / Safety Standards Guideline.pdf)
- Return the answer in concise HTML format

6. Validation Results and Benefits

6.1 What Worked Well

•

Interpretation of complex PDFs: Multimodal AI can structure PDFs containing tables, charts, and graphs with high accuracy

•

Implementation without RAG: Question answering is possible without a vector database or RAG pipeline

•

Clear source attribution: File names and paths can be included in responses as evidence

•

Flexibility with custom objects: The data model and search logic can be adjusted to fit company requirements

6.2 Expected Benefits

7. Pros, Cons, and Scope of Application

7.1 Advantages of This Approach (Without RAG)

7.2 Disadvantages and Constraints

7.3 Guideline for Scope of Application

Conclusion: If the scope is limited to internal manuals, product documents, regulations, and similar materials, this approach is sufficiently practical without RAG. Once cross-domain or large-scale knowledge search becomes necessary, it is realistic to consider moving to RAG or hybrid search.

8. Gemini 2.5 Pro Full Context Capability — Context Width Beyond RAG

Gemini 2.5 Pro, which was used in this validation, differs from RAG (which injects retrieved fragments into prompts) in that it has a large context window capable of injecting file content directly into the prompt as-is.

8.1 Specifications (When Using Prompt Templates)

8.2 Approximation in Japanese — How Many Characters Can Be Injected?

The conversion between token count and character count varies depending on the language and writing system. For Japanese, it is generally estimated that 1 token ≈ 2 to 3 characters (mixed kanji and kana text).

* Excluding output tokens, about 980,000 tokens can be used for input. The above is a theoretical maximum-level estimate.

As a practical guideline, for Japanese PDFs or text, roughly 1.5 to 2 million characters can be passed as context in a single prompt call. In terms of ordinary business documents (about 500 to 800 characters per page), this corresponds to approximately 2,000 to 4,000 pages.

8.3 Difference from RAG — What Full Context Means

The key point is that when the knowledge volume fits within roughly 1 million characters, directly injecting the relevant content into the prompt using a Full Context approach can be simpler and less likely to lose information than using RAG to “search → retrieve fragments.” In this validation, we adopted a hybrid approach in which PDFs are chunked and stored in custom objects, filtered by tags, and then the relevant chunks are passed into the prompt.

9. Future Expansion Ideas

•

Hierarchical tagging and synonym mapping

•

Pre-generation of chunk summaries and abstracts

•

Consideration of hybrid search (tags + keywords)

•

Extension to other document formats (Word, Excel)

Summary

To answer the question, “Can this be achieved without RAG?”, we validated that a combination of multimodal PDF analysis, custom objects, and tag-based search can achieve equal or better quality.

When handling PDFs that contain complex tables, charts, and graphs, structuring with Multimodal AI is effective, and even with a simple architecture that does not rely on RAG, practical knowledge search and answer generation can be achieved.

Reference Links / Tech Stack

•

Salesforce Agentforce

•

Einstein Prompt Builder (PDFExtract, Evaluate_PDF_Tags, Search_PDF_Knowledge)

•

Gemini 2.5 Pro (Multimodal PDF Analysis)

•

Custom Objects: PDF_Knowledge__c, PDFTag__c

This blog is intended to share validation results. When applying this approach in a production environment, we recommend evaluation and tuning according to your specific environment.

PDF Knowledge Search Without RAG — Validation of Agentforce × Multimodal × Custom Objects

Introduction

1. Background and Challenges

Conventional Approach: Intelligent Context / Data Library

2. Validation Approach

2.1 Why Multimodal?

2.2 Overall Architecture

3. Implementation Highlights

3.1 PDF Extraction (PDFExtract)

3.2 Tag-Based Search (As an Alternative to RAG)

3.3 Integration with Agentforce

4. Validation Scenario Examples

4.1 Example of Registered PDF

4.2 Example of Extracted Tags (All_Tags_Str)

4.3 Mapping of Example Questions and Related Tags

4.4 Example Flow (Question Example: “Please explain the safety procedures”)

5. Prompt Template Examples

5.1 PDFExtract (PDF Extraction and Structuring)

5.2 Evaluate_PDF_Tags (Tag Relevance Evaluation)

5.3 Search_PDF_Knowledge (Answer Generation)

6. Validation Results and Benefits

6.1 What Worked Well

6.2 Expected Benefits

7. Pros, Cons, and Scope of Application

7.1 Advantages of This Approach (Without RAG)

7.2 Disadvantages and Constraints

7.3 Guideline for Scope of Application

8. Gemini 2.5 Pro Full Context Capability — Context Width Beyond RAG

8.1 Specifications (When Using Prompt Templates)

8.2 Approximation in Japanese — How Many Characters Can Be Injected?

8.3 Difference from RAG — What Full Context Means

9. Future Expansion Ideas

Summary

Reference Links / Tech Stack

Related Articles

Salesforce Agent Builder / Studio / Agent Script Release Analysis — Impact on the Traditional GUI and Whether Complex Agents Can Be Realized

[Urgent Analysis] The Reality of the Grubhub Data Breach: The Attack Did Not Begin with “Grubhub” Itself

The Complete NotebookLM Guide for Beginners (2026)

Revenge for “a failure seven years ago”: How Agent Memory will reshape the future of booking systems

A New Definition of AI Agents: Why “Action” Encompasses “Skill” and Drives Business