October 06, 2011 at 01:42 PM | categories: glusterfs

I've haven't found anything about this on the web yet, and it caused me to loose some data and time, so I'm issuing a warning here. Using zip to pack and compress files on a glusterfs partition can corrupt the resulting archive.

me@machine: ~/glusterfs/ziptest$ dd if=/dev/zero count=1 of=test.dat
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000357 seconds, 1.4 MB/s
me@machine: ~/glusterfs/ziptest$ zip test.dat
  adding: test.dat (deflated 98%)
me@machine: ~/glusterfs/ziptest$ ls -lh
total 16K
-rw-r--r-- 1 kml kml 512 Oct  6 13:47 test.dat
-rw-r--r-- 1 kml kml  67 Oct  6 13:58
me@machine: ~/glusterfs/ziptest$ unzip 
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
  unzip:  cannot find zipfile directory in one of or, and cannot find, period.

Although the error pops up while unzipping, the archive itself is corrupted. This can be seen by copying the archive to another, non-glusterfs partition, where the error still occurs. A file zipped on a different partition and copied to glusterfs, however, will unzip nicely.

I haven't studied the cause of this corruption, but I presume it is connected with the central directory file header. The glusterfs setup in this case uses a simple distributed configuration, so it is not an issue with striping, although I haven't looked into any other configuration options. My personal solution was to abandon the application that required zip, and to use tar with gzip or bzip2 instead.

