Understanding ZFS : Checksum
Posted on February 6, 2019 • 3 minutes • 477 words • Suggest Changes
Ever wondered what kind of checksum ZFS uses to check for bit rot ? Probably not, but it turns out you can change the used algorithm. However like most settings, the defaults are chosen by smart people. So changing it might not be doing you any favors.
As it turns out, the default checksum used is Fletcher’s checksum;This algorithm is comparable to CRC error detection, but outperforms it by nearly 20 times per byte. So it looks as if this is really speedy.
This algorithm can actually use some of the more modern Intel CPU optimizations. (source) To check if your CPU has these optimizations, check your CPU flags, on most distro’s this can be done using :
cat /proc/cpuinfo | grep flags or lscpu | grep -i flags
Flags of interest are sse2, ssse3, avx2 and avx512f. I’m well outside the scope of knowledge here; but it seems those optimizations can vary between CPU’s and so its not always sure if the latest optimization is the best; so ZFS actually mini-benchmarks just after its kernel module is loaded to determine what algorithm is best. Even nicer, the results are stored in /proc and you can check what your CPU’s scores are; You can check it in : /proc/spl/kstat/zfs/fletcher_4_bench
In a recent server (Intel Xeon Silver 4110), avx512f/avx2 is picked :
cat /proc/spl/kstat/zfs/fletcher_4_bench 0 0 0x01 -1 0 1054412431662536 2941001241716525 implementation native byteswap scalar 3455422459 2778248102 superscalar 4626459244 3440503186 superscalar4 4008342143 3352064854 sse2 7888619803 4445292423 ssse3 7891663628 7030719031 avx2 12054042156 10904840790 avx512f 19645275791 6985259129 fastest avx512f avx2
While on an ancient test machine(Intel Pentium Dual CPU E2220) , superscalar is picked over “newer” flags such as sse2.
cat /proc/spl/kstat/zfs/fletcher_4_bench 0 0 0x01 -1 0 3970297751 15299241313174195 implementation native byteswap scalar 3690080571 2332985088 superscalar 4357932898 2616994380 superscalar4 3849236054 2487645215 sse2 2809499310 2262775111 ssse3 2809413022 2308094121 fastest superscalar superscalar
Pretty nice work, all done behind the scene in a transparant way.
Back to the stuff we can actually play with, changing the algorithm; It turns out, that one can change this to SHA256, which is required to run deduplication and that’s about it; The other alternatives have been deprecated or are not implemented in zfsonlinux, these are fletcher2, SHA512, skein and edon-r. You can also disable the checksum for a certain dataset or for an entire pool, but then the question is why even chose for ZFS.
Changing the checksum can be done like most values (as per docs) :
zfs set checksum=sha256 pool_name/dataset_name zfs set checksum=fletcher4 pool_name/dataset_name
So while its really interesting to know what is going on behind the scenes, I doubt many people should play with this unless you know what you are doing. In which case this article is not aimed for you 😉
As always, relevant information, and basically the source of this post can be found on the Github wiki of zfsonlinux, here.