Monday, June 3, 2019

Types Of Data Compression Computer Science Essay

Types Of Data Compression Computer Science EssayData conglutination has come of age in the last 20 years. both(prenominal)(prenominal) the quantity and the quality of the body of literature in this field provide ample proof of this. There are many known methods for info compression. They are found on antithetic ideas, are suitable for different types of selective information, and produce different results, just now they are all based on the same principle, viz. they compress data by removing redundancies from the original data in the extension buck. This report discusses the different types of data compression, the advantages of data compression and the procedures of data compression.2.0 data crunchData compression is important in this age because of the amount of data that is transferred within a certain network. It makes the transfer of data relatively easy 1. This part explains and compares lossy and lossless compression techniques.2.1 LOSSLESS DATA COMPRESSIONlossl ess data compression makes use of data compression algorithms that allows the exact original data to be reconstructed from the insipid data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the level data. Lossless data compression is used in many applications 2.Lossless compression is used when it is vital that the original and the de miserly data be identical, or when no as unitingption can be made on whether certain deviation is uncritical.Most lossless compression programs implements two kinds of algorithms one which generates a statistical molding for the excitant data, and another which maps the input data to opus strings using this model in such a way that probable (e.g. frequently encountered) data will produce shorter output than improbable data. Often, only the former algorithm is named, while the second is implied (through common use, standardization etc.) or unspecified 3.2.2 LOSSY DATA COMPRESSI ONA lossy data compression technique is one where compressing data and its decompression retrieves data that may will be different from the original, but is close enough to be useful in some way.There are two basic lossy compression schemesFirst is lossy transform ciphercs, where samples of picture or sound are taken, chopped into small segments, transformed into a new basis space, and quantized. The resulting quantized values are therefore entropy enrold 4.Second is lossy predictive codecs, where previous and/or subsequent decoded data is used to predict the current sound sample or work out frame.In some systems the two methods are used, with transform codecs being used to compress the error signals generated by the predictive stage.The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any known lossless method, while still meeting the requirements of the application 4.Lossless compression schem es are reversible in-order for the original data can be reconstructed, while lossy schemes accede some loss of data in order to achieve higher compression.In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, which for example always removes the last byte of a file, will always compress a file up to the point where it is empty 5.2.3 LOSSLESS vs. LOSSY DATA COMPRESSIONLossless and lossy data compressions are two methods which are use to compressed data. Each technique has its individual used. A compression among the two techniques can be summarised as follow 4-5Lossless technique keeps the source as it is during compression while a change of the original source is expected in lossy technique but very close to the origin.Lossless technique is reversible touch on which means that the original data can be reconstructed. However, the lossy technique is irreversible due to the lost of some data during extraction.Lossless technique produces deepr compressed file compared with lossy technique.Lossy technique is mostly used for images and sound.3.0 DATA COMPRESSION TECHNIQUESData compression is known as storing data in a way which requires fewer spaces than the typical. Generally, it is saving of space by the reduction in data size 6. This section explains Huffman coding and Lempel-Ziv-Welch (LZW) compression techniques.3.1 HUFFMAN CODINGHuffman coding is an entropy convert method used for lossless data compression. The term means the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for from each one possible value of the source symbol. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and print in the 1952 paper A Method for the Construction of Minimum-Redundancy Codes 4.Huffman coding implements a special method for choosing the representation for each symbol, resulting in a prefix code (sometimes called prefix-free codes, that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols 5.The technique works by creating a binary maneuver of nodes. These can be stored in a regular array, the size of which depends on the number of symbols, n. A node can be either a leaf node or an internal node. Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of appearance) of the symbol and optionally, a link to a parent node which makes it easy to represent the code (in reverse) starting from a leaf node. Internal nodes contain symbol weight, links to two child nodes and the optional link to a parent node.The process practically starts with the leaf node s containing the probabilities of the symbol they represent, and then a new node whose children are the 2 nodes with smallest probability is created, such that the new nodes probability is equal to the sum of the childrens probability. With the 2 nodes combined into one node (thus not considering them any more), and with the new node being now considered, the procedure is repeated until only one node remains, the Huffman tree 4.The simplest construction algorithm is one where a priority queues where the node with lowest probability is given highest priority 51. Create a leaf node for each symbol and add it to the priority queue.2. While there is more than one node in the queueRemove the two nodes of highest priority (lowest probability) from the queue.Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes probabilities.Add the new node to the queue.3. The remaining node is the tooth root node and the tree is complete 7.Fig ure (1).3.2 LEMPEL-ZIV-WELCH (LVW) COMPRESSIONLempel-Ziv-Welch (LZW) is a data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as a development of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is designed to be fast to implement but is not usually best because it performs only limited analysis of the data.LZW can also be called asubstitutionalordictionary-based encoding algorithm. The algorithm normally builds adata dictionary(also called atranslation tableorstring table) of data occurring in an uncompressed data stream. Patterns of data (substrings) are identified in the data stream and are matched to entries in the dictionary. If the substring is not present in the dictionary, a code phrase is created based on the data content of the substring, and it is stored in the dictionary. The phrase is then written to the compressed output stream 8.When a reoccurrence of a substring is found in the data , the phrase of the substring already stored in the dictionary is written to the output. Because the phrase value has a personal size that is smaller than the substring it represents, data compression is achieved.Decoding LZW data is the reverse of encoding. The decompressor reads the code from the stream and adds the code to the data dictionary if it is not already there. The code is then translated into the string it represents and is written to the uncompressed output stream 8.LZW goes beyond most dictionary-based compressors because it is not necessary to keep the dictionary to decode the LZW data stream. This can compose quite a bit of space when storing the LZW-encoded data 9.TIFF, among other file formats, applies the same method for graphic files. In TIFF, the pixel data is packed into bytes in the beginning being presented to LZW, so an LZW source byte might be a pixel value, part of a pixel value, or several pixel values, depending on the images bit depth and number of c olour channels.GIF requires each LZW input symbol to be a pixel value. Because GIF allows 1- to 8-bit deep images, there are between 2 and 256 LZW input symbols in GIF, and the LZW dictionary is initialized accordingly. It is not important how the pixels might have been packed into storage LZW will deal with them as a sequence of symbols 9.The TIFF orgasm does not work very well for odd-size pixels, because packing the pixels into bytes creates byte sequences that do not match the original pixel sequences, and any patterns in the pixels are obscured. If pixel boundaries and byte boundaries equip (e.g., two 4-bit pixels per byte, or one 16-bit pixel every two bytes), then TIFFs method works well 10.The GIF approach works better for odd-size bit depths, but it is difficult to extend it to more than eight bits per pixel because the LZW dictionary must become very large to achieve useful compression on large input alphabets.If variable-width codes were implemented, the encoder and dec oder must be careful to change the width at the same points in the encoded data, or they will disagree about where the boundaries between individual codes fall in the stream 11.4.0 CONCLUSIONIn conclusion, because of the fact that one cant hope to compress everything, all compression algorithms must go in that there is some bias on the input messages so that some inputs are more likely than others, i.e. that there will always be some unbalanced probability distribution over the possible messages. Most compression algorithms base this bias on the structure of the messages i.e., an assumption that repeated characters are more likely than random characters, or that large white patches occur in typical images. Compression is therefore all about probability.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.