Search This Blog

Sunday, July 11, 2010

Article: Compressing and Decompressing Data Using Java APIs

Decompressing and Extracting Data from a ZIP file


The java.util.zip package provides classes for data compression and decompression. Decompressing a ZIP file is a matter of reading data from an input stream. The java.util.zip package provides a ZipInputStream class for reading ZIP files. A ZipInputStream can be created just like any other input stream. For example, the following segment of code can be used to create an input stream for reading data from a ZIP file format:

FileInputStream fis = new FileInputStream("figs.zip");
ZipInputStream zin = new
ZipInputStream(new BufferedInputStream(fis));

Once a ZIP input stream is opened, you can read the zip entries using the getNextEntry method which returns a ZipEntry object. If the end-of-file is reached, getNextEntry returns null:

ZipEntry entry;
while((entry = zin.getNextEntry()) != null) {
// extract data
// open output streams
}


Now, it is time to set up a decompressed output stream, which can be done as follows:

int BUFFER = 2048;
FileOutputStream fos = new
FileOutputStream(entry.getName());
BufferedOutputStream dest = new
BufferedOutputStream(fos, BUFFER);


Note: In this segment of code we have used the BufferedOutputStream instead of the ZIPOutputStream. The ZIPOutputStream and the GZIPOutputStream use internal buffer sizes of 512. The use of the BufferedOutputStream is only justified when the size of the buffer is much more than 512 (in this example it is set to 2048). While the ZIPOutputStream doesn't allow you to set the buffer size, in the case of the GZIPOutputStream however, you can specify the internal buffer size as a constructor argument.

In this segment of code, a file output stream is created using the entry's name, which can be retrieved using the entry.getName method. Source zipped data is then read and written to the decompressed stream:

while ((count = zin.read(data, 0, BUFFER)) != -1) {
//System.out.write(x);
dest.write(data, 0, count);
}

And finally, close the input and output streams:

dest.flush();
dest.close();
zin.close();

The source program in Code Sample 1 shows how to decompress and extract files from a ZIP archive. To test this sample, compile the class and run it by passing a compressed file in ZIP format:

prompt> java UnZip somefile.zip

Note that somefile.zip could be a ZIP archive created using any ZIP-compatible tool, such as WinZip.

Code Sample 1: UnZip.java

import java.io.*;
import java.util.zip.*;

public class UnZip {
final int BUFFER = 2048;
public static void main (String argv[]) {
try {
BufferedOutputStream dest = null;
FileInputStream fis = new
FileInputStream(argv[0]);
ZipInputStream zis = new
ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) {
System.out.println("Extracting: " +entry);
int count;
byte data[] = new byte[BUFFER];
// write the files to the disk
FileOutputStream fos = new
FileOutputStream(entry.getName());
dest = new
BufferedOutputStream(fos, BUFFER);
while ((count = zis.read(data, 0, BUFFER))
!= -1) {
dest.write(data, 0, count);
}
dest.flush();
dest.close();
}
zis.close();
} catch(Exception e) {
e.printStackTrace();
}
}
}


It is important to note that the ZipInputStream class reads ZIP files sequentially. The class ZipFile, however, reads the contents of a ZIP file using a random access file internally so that the entries of the ZIP file do not have to be read sequentially.

Note: Another fundamental difference between ZIPInputStream and ZipFile is in terms of caching. Zip entries are not cached when the file is read using a combination of ZipInputStream and FileInputStream. However, if the file is opened using ZipFile(fileName) then it is cached internally, so if ZipFile(fileName) is called again the file is opened only once. The cached value is used on the second open. If you work on UNIX, it is worth noting that all zip files opened using ZipFile are memory mapped, and therefore the performance of ZipFile is superior to ZipInputStream. If the contents of the same zip file, however, are be to frequently changed and reloaded during program execution, then using ZipInputStream is preferred.

This is how a ZIP file can be decompressed using the ZipFile class:

1. Create a ZipFile object by specifying the ZIP file to be read either as a String filename or as a File object:

ZipFile zipfile = new ZipFile("figs.zip");
2. Use the entries method, returns an Enumeration object, to loop through all the ZipEntry objects of the file:

while(e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
// read contents and save them
}

3. Read the contents of a specific ZipEntry within the ZIP file by passing the ZipEntry to getInputStream, which will return an InputStream object from which you can read the entry's contents:

is = new
BufferedInputStream(zipfile.getInputStream(entry));



4. Retrieve the entry's filename and create an output stream to save it:

byte data[] = new byte[BUFFER];
FileOutputStream fos = new
FileOutputStream(entry.getName());
dest = new BufferedOutputStream(fos, BUFFER);
while ((count = is.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, count);
}


5. Finally, close all input and output streams:

dest.flush();
dest.close();
is.close();

The complete source program is shown in Code Sample 2. Again, to test this class, compile it and run it by passing a file in a ZIP format as an argument:

prompt> java UnZip2 somefile.zip

Code Sample 2: UnZip2.java

import java.io.*;
import java.util.*;
import java.util.zip.*;

public class UnZip2 {
static final int BUFFER = 2048;
public static void main (String argv[]) {
try {
BufferedOutputStream dest = null;
BufferedInputStream is = null;
ZipEntry entry;
ZipFile zipfile = new ZipFile(argv[0]);
Enumeration e = zipfile.entries();
while(e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
System.out.println("Extracting: " +entry);
is = new BufferedInputStream
(zipfile.getInputStream(entry));
int count;
byte data[] = new byte[BUFFER];
FileOutputStream fos = new
FileOutputStream(entry.getName());
dest = new
BufferedOutputStream(fos, BUFFER);
while ((count = is.read(data, 0, BUFFER))
!= -1) {
dest.write(data, 0, count);
}
dest.flush();
dest.close();
is.close();
}
} catch(Exception e) {
e.printStackTrace();
}
}
}


Compressing and Archiving Data in a ZIP File

The ZipOutputStream can be used to compress data to a ZIP file. The ZipOutputStream writes data to an output stream in a ZIP format. There are a number of steps involved in creating a ZIP file.

1. The first step is to create a ZipOutputStream object, to which we pass the output stream of the file we wish to write to. Here is how you create a ZIP file entitled "myfigs.zip":

FileOutputStream dest = new
FileOutputStream("myfigs.zip");
ZipOutputStream out = new
ZipOutputStream(new BufferedOutputStream(dest));

2. Once the target zip output stream is created, the next step is to open the source data file. In this example, source data files are those files in the current directory. The list command is used to get a list of files in the current directory:

File f = new File(".");
String files[] = f.list();
for (int i=0; i System.out.println("Adding: "+files[i]);
FileInputStream fi = new FileInputStream(files[i]);
// create zip entry
// add entries to ZIP file
}


Note: This code sample is capable of compressing all files in the current directory. It doesn't handle subdirectories. As an exercise, you may want to modify Code Sample 3 to handle subdirectories.


3. Create a zip entry for each file that is read:
4. ZipEntry entry = new ZipEntry(files[i])) Before you can write data to the ZIP output stream, you must first put the zip entry object using the putNextEntry method:
5. out.putNextEntry(entry); Write the data to the ZIP file:

int count;
while((count = origin.read(data, 0, BUFFER)) != -1) {
out.write(data, 0, count);
}

6. Finally, you close the input and output streams:

origin.close();
out.close();

The complete source program is shown in Code Sample 3.

Code Sample 3: Zip.java

import java.io.*;
import java.util.zip.*;

public class Zip {
static final int BUFFER = 2048;
public static void main (String argv[]) {
try {
BufferedInputStream origin = null;
FileOutputStream dest = new
FileOutputStream("c:\\zip\\myfigs.zip");
ZipOutputStream out = new ZipOutputStream(new
BufferedOutputStream(dest));
//out.setMethod(ZipOutputStream.DEFLATED);
byte data[] = new byte[BUFFER];
// get a list of files from current directory
File f = new File(".");
String files[] = f.list();

for (int i=0; i System.out.println("Adding: "+files[i]);
FileInputStream fi = new
FileInputStream(files[i]);
origin = new
BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(files[i]);
out.putNextEntry(entry);
int count;
while((count = origin.read(data, 0,
BUFFER)) != -1) {
out.write(data, 0, count);
}
origin.close();
}
out.close();
} catch(Exception e) {
e.printStackTrace();
}
}
}