Parquet vs QVD: A Performance and Size Comparison for Qlik Developers

Mark Costa
Feb 17
3 min read

When working with large datasets in Qlik, developers often rely on QVD files for optimized storage and fast data retrieval. However, with the growing adoption of open data formats and data capacity subscriptions, Parquet has emerged as a strong alternative. In this article, we’ll dive into a comparison of Parquet and QVD files, focusing on two key aspects: file size and load performance.

Understanding the File Formats

QVD (QlikView Data) Files

Proprietary format optimized for Qlik data storage
Offers fast data retrieval within Qlik applications
Supports optimized load for incremental data processing
Supports embedded metadata

Parquet Files

Open-source, columnar storage format used widely in big data ecosystems
Efficient data compression and encoding, reducing file sizes
Supported by various data platforms and processing engines
Can be compressed using different algorithms such as Snappy, Gzip, and Brotli
Takes great advantage of S3 Table Buckets for scalable and cost-effective cloud storage

Test Data

To ensure a fair and representative comparison, we used eight tables extracted from the Snowflake sample database SNOWFLAKE_SAMPLE_DATA under the schema TPCH_SF10. The dataset includes:

Largest Table: 60 million records
Smallest Table: 5 records
Column Count: Ranges from a maximum of 16 to a minimum of 3
16,640 reload tests done at different times in a Qlik Cloud Tenant

This dataset provides a diverse range of record sizes and column counts, allowing us to evaluate both file formats under various data complexities.

All the tables were saved in Qlik Cloud in the formats of QVD and Parquet, utilizing the seven compression codecs supported by Parquet: uncompressed, snappy, gzip, lz4, brotli, zstd, and lz4_hadoop. Snappy is the default compression codec for Qlik.

File Size Comparison

One of the most striking differences between Parquet and QVD files is the size. Using Brotli compression, Parquet files achieve a significantly smaller footprint compared to QVD files. Our benchmark tests indicate:

Parquet (Brotli Compression): up to 121% smaller than equivalent QVD files
Parquet (Snappy Compression): up to 81% smaller than QVD files

In the majority of scenarios tested, QVD Files typically have larger file sizes, surpassed only by uncompressed Parquet files. Factors such as the number of columns, types of columns, and the density and cardinality of data within those columns can influence the file sizes.

This reduction in size can lead to lower storage costs and improved data transfer speeds, especially in cloud-based environments.

Load Performance in Qlik

To evaluate load performance, we tested multiple load operations or "modes", including Full Load (loading all rows and columns), aggregation, filtering with a WHERE clauses, and reducing the number of loaded columns. These different scenarios allowed us to assess how each format handles varied data retrieval and transformation tasks.

While QVD files are designed for optimal performance within Qlik, Parquet files (depending on compression and structure) can most of times outperform QVD in load times. Our tests measured the time taken to load large datasets into Qlik Sense, and the results showed:

Parquet: Faster in almost every scenario
QVD Files: Faster only when full loading all rows and columns of large datasets

Reload Duration Comparison (Median) - LINEITEM Table ~60M rows

Reload Duration Comparison Against QVD - LINEITEM Table ~60M rows

Reload Duration Comparison (Median) - ORDERS Table 15M rows

Reload Duration Comparison Against QVD - ORDERS Table 15M rows

For datasets with fewer than a million rows, the load durations are nearly identical, making the time difference insignificant.

Final Thoughts

QVDs hold a special place for every Qlik Developer and will be difficult to part with. They are highly reliable and significantly faster than most flat file formats. However, Parquet files, particularly with Brotli compression, present compelling benefits in terms of size and often performance. Parquet is a formidable option for modern, scalable data architectures and can greatly reduce your Capacity and Data Storage costs.

Qlik's recent acquisition of Upsolver clearly signals that Parquet files represent the future for Qlik developers, as the company increasingly adopts open formats and cloud-based data processing. Also, utilizing Parquet files within Iceberg architectures, such as AWS S3 Table Buckets, can dramatically decrease reload times.

Are you using Parquet in your Qlik workflows? Let’s discuss in the comments below!

4 Comments

Chris Fountain

Mar 06

Thank you for this important analysis. However, how can something be 121% smaller than something else?

Mark Costa

Mar 09

Replying to

Thank you, Chris. That is actually the Percentage Difference. I’ve noticed before that this might not be the most effective calculation to display here. Using '75% smaller' would likely be a clearer and more straightforward way to represent the difference.

I'm currently working on version 2 of this article, which will include this new expression. I’ll also be sharing all the apps, Load Scripts, and expressions on GitHub for easy access, further studies and customizations.

Rob Wunderlich

Feb 17

Hello, what platform did you run these tests on? Parquet files achieve the smaller size thorough compression, which uses CPU. My own testing indicates that Load time is impacted on whether the system has more IO or CPU available.

Hi Rob, first of all, great to have you here! We only have used Qlik Cloud to load and store the data in Qlik Cloud. We haven't used any on-prem process during the tests.