why zfs and I aren’t on speaking terms

Posted by Thomas Sat, 20 Mar 2010 13:07:48 +0000

Per Cole’s request:

I’m not exactly sure what happened (’cause it happened just last night), but I ~lost a whole bunch of data. Granted it was a scratch disk, and granted I have a backup from the middle of January, that’s not the point. So zfs has snapshots right? Well that’s what I have backups of, the snapshots. There were like 4 snapshots. I have perfect replicas of my first two snapshots. I even have perfect incremental snapshots between each. But I can’t seem to get my backup “take” the incrementals. That was part of the issue. Now yesterday, likely due this scratch drive going bad (not superbad, just retrying sector reads a lot), the box kernel paniced. I saw it just before I went to work. It rebooted and came up fine. I went to work. I’m pretty fed up with the filer not being in a perfect state of being, so I sit down after work to beat it into submission. First item on the agenda is getting a perfect copy of the scratch disk, or at least figuring out what my good backups are missing. Well, I can’t even read the darn thing. It says something about corrupt metadata. At this point I have very few options. Basically they amount to a `zfs export` and `zfs import`. And I’m pretty sure that once I export it, I won’t be able to get it back, ever again. So, feeling that I have no other recourse, I export, and guess what happens: I can’t re-import it. Well fooey. I basically take my lumps, have a decent backup from January, recover from elsewhere the data I know I am missing, and curse zfs for being so smart that I have like zero recourse to debug w/o begging Sun engineers on a mailing list of how to recover this drive. Since the o/s that I’m running is quite old, it is also possible that newer versions of OpenSolaris could be better about being able to import this disk. I tried to upgrade my install of Nexenta last night, to no avail. Neither dist-upgrade nor a fresh install worked for me. So I’m stuck for a bit. Maybe over the course of the next few months, I might be able to recover it. I’m going to try not to worry about it too much. It’s just a super pita that it’s such a huge black box. Even if I, say, wanted to ddrescue the whole disk, I have no idea how to import it into the os. And there are tools (like zdb, etc) that in theory you might be able to recover, but that’s all voodoo to me. Anyhoo, that’s the gist of it. I lost another 1TB disk the other week. I still haven’t put all of the 5 1TB disks that I bought some time ago in. Which is the main goal of this weekend. I got another disk at Microcenter. I did a replacement last night, so I have 3 more to go. Given that they take 6 or 8 hours (depending on how much data is on the stripe), and that i/o basically comes to a crawl when this is going on, it’s going to be a long weekend.

Posted in Technology | 1 Comment


  1. KB said on March 21, 2010 @ 11:44 am:

    I read the heading and wondered WHO zfs was!

