Research Paper: Qlik Answers Groundedness & Quality Evaluation

Igor Alcantara
Jan 26
1 min read

ABSTRACT

Retrieval Augmented Generation (RAG) has become a dominant approach for question answering over large collections of unstructured documents. The UDA (Unstructured Document Analysis) benchmark provides a rigorous suite for evaluating RAG systems on real world long documents. This paper presents a large scale, faithfulness oriented evaluation of Qlik Answers on a UDA subset composed of unstructured PDF reports. A corpus of 136 documents (19,959 pages) and 11,842 question–answer (QA) pairs is used to build a dedicated knowledge base and to drive a row level evaluation. Five primary metrics, faithfulness, correctness, relevancy, answerability, and attribution, are computed via an LLM as a judge framework, with a sample of 2,000 QA pairs evaluated by human reviewers to verify judge reliability. Qlik Answers attains a faithfulness score of 0.97, correctness 0.98, relevancy 0.96, answerability 1.00, and attribution 0.99, yielding an overall average of 0.98. Comparison with public ranges reported for leading RAG stacks such as Gemini, ChatGPT 4 RAG, Claude 3, Llama-Index RAG, and Vectara OpenRAG indicates that Qlik Answers delivers competitive or superior groundedness and source attribution under this evaluation methodology.

Download the research paper for free below:

Research Paper: Qlik Answers Groundedness & Quality Evaluation

ABSTRACT

Recent Posts

Comments