top of page

Parquet vs QVD: A Performance and Size Comparison for Qlik Developers

Writer: Mark CostaMark Costa

When working with large datasets in Qlik, developers often rely on QVD files for optimized storage and fast data retrieval. However, with the growing adoption of open data formats and data capacity subscriptions, Parquet has emerged as a strong alternative. In this article, we’ll dive into a comparison of Parquet and QVD files, focusing on two key aspects: file size and load performance.


Understanding the File Formats


QVD (QlikView Data) Files

  • Proprietary format optimized for Qlik data storage

  • Offers fast data retrieval within Qlik applications

  • Supports optimized load for incremental data processing

  • Supports embedded metadata


Parquet Files

  • Open-source, columnar storage format used widely in big data ecosystems

  • Efficient data compression and encoding, reducing file sizes

  • Supported by various data platforms and processing engines

  • Can be compressed using different algorithms such as Snappy, Gzip, and Brotli

  • Takes great advantage of S3 Table Buckets for scalable and cost-effective cloud storage


Test Data

To ensure a fair and representative comparison, we used eight tables extracted from the Snowflake sample database SNOWFLAKE_SAMPLE_DATA under the schema TPCH_SF10. The dataset includes:

  • Largest Table: 60 million records

  • Smallest Table: 5 records

  • Column Count: Ranges from a maximum of 16 to a minimum of 3

  • 16,640 reload tests done at different times in a Qlik Cloud Tenant


Sample Tables used in this analysis
Sample Tables used in this analysis

This dataset provides a diverse range of record sizes and column counts, allowing us to evaluate both file formats under various data complexities.


All the tables were saved in Qlik Cloud in the formats of QVD and Parquet, utilizing the seven compression codecs supported by Parquet: uncompressed, snappy, gzip, lz4, brotli, zstd, and lz4_hadoop. Snappy is the default compression codec for Qlik.


File Size Comparison

One of the most striking differences between Parquet and QVD files is the size. Using Brotli compression, Parquet files achieve a significantly smaller footprint compared to QVD files. Our benchmark tests indicate:

  • Parquet (Brotli Compression): up to 121% smaller than equivalent QVD files

  • Parquet (Snappy Compression): up to 81% smaller than QVD files


File Size Comparison (in bytes)
File Size Comparison (in bytes)
File Size Comparison against QVD
File Size Comparison against QVD

In the majority of scenarios tested, QVD Files typically have larger file sizes, surpassed only by uncompressed Parquet files. Factors such as the number of columns, types of columns, and the density and cardinality of data within those columns can influence the file sizes.


This reduction in size can lead to lower storage costs and improved data transfer speeds, especially in cloud-based environments.


Load Performance in Qlik

To evaluate load performance, we tested multiple load operations or "modes", including Full Load (loading all rows and columns), aggregation, filtering with a WHERE clauses, and reducing the number of loaded columns. These different scenarios allowed us to assess how each format handles varied data retrieval and transformation tasks.


While QVD files are designed for optimal performance within Qlik, Parquet files (depending on compression and structure) can most of times outperform QVD in load times. Our tests measured the time taken to load large datasets into Qlik Sense, and the results showed:


  • Parquet: Faster in almost every scenario

  • QVD Files: Faster only when full loading all rows and columns of large datasets


Reload Duration Comparison (Median) - LINEITEM Table ~60M rows
Reload Duration Comparison (Median) - LINEITEM Table ~60M rows
Reload Duration Comparison Against QVD - LINEITEM Table ~60M rows
Reload Duration Comparison Against QVD - LINEITEM Table ~60M rows

Reload Duration Comparison (Median) - ORDERS Table 15M rows
Reload Duration Comparison (Median) - ORDERS Table 15M rows
Reload Duration Comparison Against QVD - ORDERS Table 15M rows
Reload Duration Comparison Against QVD - ORDERS Table 15M rows

For datasets with fewer than a million rows, the load durations are nearly identical, making the time difference insignificant.


Final Thoughts

QVDs hold a special place for every Qlik Developer and will be difficult to part with. They are highly reliable and significantly faster than most flat file formats. However, Parquet files, particularly with Brotli compression, present compelling benefits in terms of size and often performance. Parquet is a formidable option for modern, scalable data architectures and can greatly reduce your Capacity and Data Storage costs.


Qlik's recent acquisition of Upsolver clearly signals that Parquet files represent the future for Qlik developers, as the company increasingly adopts open formats and cloud-based data processing. Also, utilizing Parquet files within Iceberg architectures, such as AWS S3 Table Buckets, can dramatically decrease reload times.


Are you using Parquet in your Qlik workflows? Let’s discuss in the comments below!

 
 
 

4 komentara


Chris Fountain
Chris Fountain
06. ožu

Thank you for this important analysis. However, how can something be 121% smaller than something else?

Sviđa mi se
Mark Costa
Mark Costa
09. ožu
Odgovor upućen

Thank you, Chris. That is actually the Percentage Difference. I’ve noticed before that this might not be the most effective calculation to display here. Using '75% smaller' would likely be a clearer and more straightforward way to represent the difference.

I'm currently working on version 2 of this article, which will include this new expression. I’ll also be sharing all the apps, Load Scripts, and expressions on GitHub for easy access, further studies and customizations.

Sviđa mi se

Rob Wunderlich
Rob Wunderlich
17. velj

Hello, what platform did you run these tests on? Parquet files achieve the smaller size thorough compression, which uses CPU. My own testing indicates that Load time is impacted on whether the system has more IO or CPU available.

Sviđa mi se
Mark Costa
Mark Costa
17. velj
Odgovor upućen

Hi Rob, first of all, great to have you here! We only have used Qlik Cloud to load and store the data in Qlik Cloud. We haven't used any on-prem process during the tests.

Sviđa mi se

© 2024 Data Voyagers

  • Youtube
  • LinkedIn
bottom of page