The compression ratio is drastically different depending on the file type. The shorter the bars, the smaller the compressed file is. The colour of the bar denotes format/command. The y-axis shows the percentage size difference between the original file and the compressed file. The x-axis shows the different file types. A smaller value means higher compression. 7z and xz seems to be pretty slow taking more than 2 mins for all file types.įile size comparison between original file and compressed file for each format/command for each file type. lbzip2, pbzip2 and pigz are also extremely fast for all file types taking around 15 secs or so. lz4 seems to be the fastest for all file types. The shorter the bars, the faster the command is. The y-axis shows the total time taken (in seconds) for compression plus decompression. Total time used by each format/command for each file type. By running on 8 cores, the methods were free to utilize the 8 available cores if they could. Most of these commands allow the user to set the level of compression but this was not changed. All commands used the default settings/switches. Similarly, pigz is the multi-threaded version of gzip.Īll commands were run on a Scientific Linux cluster with 8 cores and a total of 24GB of RAM. lbzip2 and pbzip2 are multi-threaded versions of bzip2. The 7za command by default compresses to the 7z format but also allows exporting to bzip2, gzip and zip. For decompression, the same commands were used except for zip where unzip was used. Seven different compression formats were tested: 7z, bzip2, gzip, lrzip, lz4, xz and zip using ten different compression commands: 7za, bzip2, lbzip2, pbzip2, gzip, pigz, lrzip, xz and zip. For clarity, fastq files are text files containing next generation sequencing data and tiff stacks are used for image analysis using ImageJ, for example. Some properties of the files: fastq file (403 MB, 1.56 million reads), mp3 tar archive (390 MB, a tar archive composed of four tar archives each with 6 mp3 tracks of size 10MB to 32MB), mp4 file (340 MB), text file (400MB, created using ( base64 /dev/urandom | head -c 419430400 > text.txt) and tiff stack (404MB, 1380 frames, 640 x 480 px, sequence of zebrafish larvae swimming in a microtitre plate). bzip2 compression using the command lbzip2 and pbzip2 comes out as the winner due to high compression ratio, speed and multi-threading capabilities.įive different data files were tested: a fastq text file, mp3 tar archive, an mp4 movie file, a randomly generated text file and a tiff image stack. Seven different compression formats (7z, bzip2, gzip, lrzip, lz4, xz and zip) are tested using ten different compression commands (7za, bzip2, lbzip2, lrzip, lz4, pbzip2, gzip, pigz, xz and zip) on five different file types (fastq, mp3 tar archive, mp4 movie file, random text file and a tiff stack) for compression ratio and time.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |