Fli
10-22-2020, 04:32 PM
Do NOT use ZFS in these cases:
1. you want to use ZFS on single/one external USB drive (worst case, data corruption will happen on non clean dismount, and you would have to recreate whole dataset)
2. you want to use ZFS on single/one drive and you do not have any external drive for the backup purpose (why? when the zpool is not cleanly dismounted/exported, some data can get corrupted permanently and zfs will have no other mirror drive from which it can automatically get valid data, unless you get secondary drive of same type, size for parity, redundancy)
3. you do not have hours of your time to learn basics of ZFS management, on this page though are most basic things
Majority of following commands will work on all Linux distributions though first part of the tutorial is using Arch/Manjaro Linux packages and package manager. On Ubuntu i was able to setup ZFS using command "sudo apt install zfsutils-linux". If you have other distribution, you need to discover if your distribution has packages for zfs (and kernel modules).
Upgrade and update system and reboot (in case new kernel was installed since last reboot)
A)
sudo pacman -S linux-latest-zfs zfs-utils zfs-dkms
reboot
sudo /sbin/modprobe zfs
if modprobe not works, try "sudo pacman -R linux-latest-zfs" and try method B:
B)
Discover installed kernels:
uname -r
pacman -Q | grep “^linux”
and install zfs packages for these:
sudo pacman -Ss zfs|grep -i linux
sudo pacman -S linux123-zfs
pamac install zfs-dkms
reboot
# enable zfs support in kernel (it was not enabled in 5.8.16-2-MANJARO after reboot, but once enabled by following command it persist)
sudo /sbin/modprobe zfs
===================================
Open two pages and search for things and parameters to understand following commands:
https://zfsonlinux.org/manpages/0.8.1/man8/zpool.8.html
https://zfsonlinux.org/manpages/0.8.1/man8/zfs.8.html
sudo smartctl -a /dev/sdb|grep -i "sector size"
Sector Sizes: 512 bytes logical, 4096 bytes physical
(smartctl is in package "smartmontools")
It was suggested here https://forum.proxmox.com/threads/how-can-i-set-the-correct-ashift-on-zfs.58242/post-268384 to use in the following "zpool create" command the parameter ashift=12 for drives with 4096 bytes physical sector size and for 8K physical to use ashift=13. If ashift not defined, then zfs use autodetect where i do not know how good it is.
# attempt to create pool named "poolname" on a HDD of choice: (use disk that store no important data or it will be lost + unmount the drive, maybe using gparted)
A) sudo zpool create -o ashift=12 -o feature@async_destroy=enabled -o feature@empty_bpobj=enabled -o feature@lz4_compress=enabled poolname /dev/disk/by-id/ID-HERE(ls -l /dev/disk/by-id/)
or the same command only the pool will be created across 2 physical drives (of same size, else pool will not use all the space on bigger drive?) where one will be used for redundancy (recommended to reduce irreversible data corruption risk and double the read/write performance)
B) sudo zpool create -o ashift=12 -o feature@async_destroy=enabled -o feature@empty_bpobj=enabled -o feature@lz4_compress=enabled poolname mirror /dev/disk/by-id/DRIVE1-ID-HERE(ls -l /dev/disk/by-id/) /dev/disk/by-id/DRIVE2-ID-HERE(ls -l /dev/disk/by-id/)
(for 4 drives mirror, it should be: zpool create poolname mirror drive1id drive2id mirror drive3id drive4id)
Regarding following parameter recordsize, it was suggested on places like this https://blog.programster.org/zfs-record-size and https://jrs-s.net/2019/04/03/on-zfs-recordsize/ and https://www.reddit.com/r/zfs/comments/8l20f5/zfs_record_size_is_smaller_really_better/ that for large media files drive, the block size is better to increase from 128k to 512k. So i did it for my multimedia drive. Though above linked manual page for zfs says this value is only suggested and zfs automatically adjust size per usage patterns. Also is said on mentioned unofficial article that the record size should be similar to a size of the typical storage operation within the dataset which may contradict with the file size itself. "zpool iostat -r" shows the operation sizes distribution/counts, also if zpool is single drive, maybe can be used "sudo iostat -axh 3 /dev/zpooldrivename" and checking the "rareq-sz" (read average request size).
Creating two datasets one encrypted one not:
sudo zfs create -o compression=lz4 -o checksum=skein -o atime=off -o xattr=sa -o encryption=on -o keyformat=passphrase -o mountpoint=/e poolname/enc
sudo zfs create -o compression=lz4 -o checksum=skein -o atime=off -o xattr=sa -o encryption=off -o recordsize=512K -o mountpoint=/d poolname/data
fix permissions:
sudo chown -R $(whoami):$(whoami) /poolname /e /d
gracefully unmount the pools (i think necessary or poor will be marked as suspended and compute restart needed):
sudo zpool export -a
mount the pools:
sudo zpool import -a
(if it fails, you have to mount manually, list disk names (ls -l /dev/disk/by-id/), then: sudo zpool import -a -d /dev/disk/by-id/yourdisk1name-part1 -d /dev/disk/by-id/yourdisk2name-part1 )
If some pool is encrypted, then additional command needed (-l parameter to enter passphrase, else it complains "encryption key not loaded"):
sudo zfs mount -a -l
pool activity statistics:
zpool iostat -vlq
zpool iostat -r (request size histograms)
zpool iostat -w (wait/latency histograms)
intent log statistics:
cat /proc/spl/kstat/zfs/zil
change mountpoint of some dataset within the pool:
sudo mkdir /new;sudo zfs set mountpoint=/new poolname/datasetname
rename/move dataset (error "cannot destroy filesystem has children"):
sudo zfs rename poolname/dataset/subdataset poolname/subdatasetnew
attach new drive (if the existing one is non-redundant single drive, result will be mirror (something like RAID1, with enhanced read/write and 1 drive fault tollerance, data self healing), if existing is part of mirror, it will be three way mirror:
zpool attach poolname existingdrive newdrive
Detach, remove, replace, see manual page (man zpool) or https://zfsonlinux.org/manpages/0.8.1/man8/zpool.8.html
create snapshot:
zfs snapshot -r poolname@snapshot1
List snapshots:
zfs list -t snapshot
Destroy (delete) all snapshots (no prompt) - if manually syncing pools using zfs send/receive or syncoid, this will break its function:
defined pool: sudo zfs list -H -o name -t snapshot -r POOLNAME|sudo xargs -n1 zfs destroy
all pools: sudo zfs list -H -o name -t snapshot|xargs -n1 sudo zfs destroy
destroy (delete) dataset (no prompt):
sudo zfs destroy poolname/enc
destroy (delete) whole pool (no prompt):
sudo zpool destroy poolname
========
If you are OK with the HDD activity to increase at times regular activity is no/low, then consider enabling automatic scrubbing (kind of runtime "fsck" that checks files and even can repair files on replicated devices (mirror/raidz)). Following sets the monthly task:
1. su;echo -e "[Unit]\nDescription=Monthly zpool scrub on %i\n\n[Timer]\nOnCalendar=monthly\nAccuracySec=1h\nPersistent=t rue\n\n[Install]\nWantedBy=multi-user.target" > /etc/systemd/system/[email protected]
2. echo -e "[Unit]\nDescription=zpool scrub on %i\n\n[Service]\nNice=19\nIOSchedulingClass=idle\nKillSignal=SIGI NT\nExecStart=/usr/bin/zpool scrub %i\n\n[Install]\nWantedBy=multi-user.target" > /etc/systemd/system/[email protected];exit
3. systemctl enable [email protected]
========
Another page worth reading: https://wiki.archlinux.org/index.php/ZFS
Terminology:
ZIL - ZFS intent log is allocated from blocks within the main pool. However, it might be possible to get better sequential write performance using separate intent log devices (SLOG) such as NVRAM.
SLOG - It's just a really fast place/device to store the ZIL (Zfs Intent Log). Most systems do not write anything close to 4GB to ZIL. (cat /proc/spl/kstat/zfs/zil). ZFS will not benefit from more SLOG storage than the maximum ARC size. That is half of system memory on Linux by default. SLOG device can only increase throughput and decrease latency in a workload with many sync writes.
ARC - Adaptive Replacement Cache is the ZFS read cache in the main memory (DRAM).
L2ARC - Second Level Adaptive Replacement Cache is used to store read cache data outside of the main memory. ... use read-optimized SSDs (no need to mirror/fault tolerance)
Cache - These devices (typically a SSD) are managed by L2ARC to provide an additional layer of caching between main memory and disk. For read-heavy workloads, where the working set size is much larger than what can be cached in main memory, using cache devices allow much more of this working set to be served from low latency media. Using cache devices provides the greatest performance improvement for random read-workloads of mostly static content. (zpool add POOLNAME cache DEVICENAME)
Interesting utilities:
ZREP is a ZFS based replication and failover script https://github.com/bolthole/zrep
Syncoid facilitates the asynchronous incremental replication of ZFS filesystems https://github.com/jimsalterjrs/sanoid#syncoid
Syncoid sync command looked like this in my case:
syncoid -r --skip-parent --no-stream poolsource pooldestination
# -r --recursive This will also transfer child datasets.
# --skip-parent Skips syncing of the parent dataset. Does nothing without '--recursive' option. Syncs only datasets contents.
# --*****naps ***** a list of snapshots during the run. Shows overview of snapshots.
# --no-stream Replicates using newest snapshot instead of intermediates. Syncs without replicating the intermediate snapshots in between. And will retain only most recent syncoid snapshot.
========
ZFS zpool file statistics (file size, number of files):
/decko/
a) zpool iostat -r;zpool iostat -w
b)
find /decko/ -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'
1k: 8102
2k: 2938
4k: 2169
8k: 2102
16k: 2311
32k: 2986
64k: 2533
128k: 2164
256k: 2146
512k: 1692
1M: 2284
2M: 4512
4M: 7483
8M: 7890
16M: 4184
32M: 1911
64M: 484
128M: 1461
256M: 4911
512M: 2344
1G: 578
2G: 113
4G: 13
8G: 11
16G: 2
/ecko/ZN:
find /ecko/ZN/ -type f -print0 2>/dev/null| xargs -0 ls -l 2>/dev/null| awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'
1k: 403007
2k: 33644
4k: 48356
8k: 155711
16k: 62305
32k: 52709
64k: 47308
128k: 44223
256k: 35698
512k: 32049
1M: 34376
2M: 22291
4M: 38327
8M: 8134
16M: 2448
32M: 1346
64M: 1948
128M: 1438
256M: 379
512M: 276
1G: 124
2G: 3
1. you want to use ZFS on single/one external USB drive (worst case, data corruption will happen on non clean dismount, and you would have to recreate whole dataset)
2. you want to use ZFS on single/one drive and you do not have any external drive for the backup purpose (why? when the zpool is not cleanly dismounted/exported, some data can get corrupted permanently and zfs will have no other mirror drive from which it can automatically get valid data, unless you get secondary drive of same type, size for parity, redundancy)
3. you do not have hours of your time to learn basics of ZFS management, on this page though are most basic things
Majority of following commands will work on all Linux distributions though first part of the tutorial is using Arch/Manjaro Linux packages and package manager. On Ubuntu i was able to setup ZFS using command "sudo apt install zfsutils-linux". If you have other distribution, you need to discover if your distribution has packages for zfs (and kernel modules).
Upgrade and update system and reboot (in case new kernel was installed since last reboot)
A)
sudo pacman -S linux-latest-zfs zfs-utils zfs-dkms
reboot
sudo /sbin/modprobe zfs
if modprobe not works, try "sudo pacman -R linux-latest-zfs" and try method B:
B)
Discover installed kernels:
uname -r
pacman -Q | grep “^linux”
and install zfs packages for these:
sudo pacman -Ss zfs|grep -i linux
sudo pacman -S linux123-zfs
pamac install zfs-dkms
reboot
# enable zfs support in kernel (it was not enabled in 5.8.16-2-MANJARO after reboot, but once enabled by following command it persist)
sudo /sbin/modprobe zfs
===================================
Open two pages and search for things and parameters to understand following commands:
https://zfsonlinux.org/manpages/0.8.1/man8/zpool.8.html
https://zfsonlinux.org/manpages/0.8.1/man8/zfs.8.html
sudo smartctl -a /dev/sdb|grep -i "sector size"
Sector Sizes: 512 bytes logical, 4096 bytes physical
(smartctl is in package "smartmontools")
It was suggested here https://forum.proxmox.com/threads/how-can-i-set-the-correct-ashift-on-zfs.58242/post-268384 to use in the following "zpool create" command the parameter ashift=12 for drives with 4096 bytes physical sector size and for 8K physical to use ashift=13. If ashift not defined, then zfs use autodetect where i do not know how good it is.
# attempt to create pool named "poolname" on a HDD of choice: (use disk that store no important data or it will be lost + unmount the drive, maybe using gparted)
A) sudo zpool create -o ashift=12 -o feature@async_destroy=enabled -o feature@empty_bpobj=enabled -o feature@lz4_compress=enabled poolname /dev/disk/by-id/ID-HERE(ls -l /dev/disk/by-id/)
or the same command only the pool will be created across 2 physical drives (of same size, else pool will not use all the space on bigger drive?) where one will be used for redundancy (recommended to reduce irreversible data corruption risk and double the read/write performance)
B) sudo zpool create -o ashift=12 -o feature@async_destroy=enabled -o feature@empty_bpobj=enabled -o feature@lz4_compress=enabled poolname mirror /dev/disk/by-id/DRIVE1-ID-HERE(ls -l /dev/disk/by-id/) /dev/disk/by-id/DRIVE2-ID-HERE(ls -l /dev/disk/by-id/)
(for 4 drives mirror, it should be: zpool create poolname mirror drive1id drive2id mirror drive3id drive4id)
Regarding following parameter recordsize, it was suggested on places like this https://blog.programster.org/zfs-record-size and https://jrs-s.net/2019/04/03/on-zfs-recordsize/ and https://www.reddit.com/r/zfs/comments/8l20f5/zfs_record_size_is_smaller_really_better/ that for large media files drive, the block size is better to increase from 128k to 512k. So i did it for my multimedia drive. Though above linked manual page for zfs says this value is only suggested and zfs automatically adjust size per usage patterns. Also is said on mentioned unofficial article that the record size should be similar to a size of the typical storage operation within the dataset which may contradict with the file size itself. "zpool iostat -r" shows the operation sizes distribution/counts, also if zpool is single drive, maybe can be used "sudo iostat -axh 3 /dev/zpooldrivename" and checking the "rareq-sz" (read average request size).
Creating two datasets one encrypted one not:
sudo zfs create -o compression=lz4 -o checksum=skein -o atime=off -o xattr=sa -o encryption=on -o keyformat=passphrase -o mountpoint=/e poolname/enc
sudo zfs create -o compression=lz4 -o checksum=skein -o atime=off -o xattr=sa -o encryption=off -o recordsize=512K -o mountpoint=/d poolname/data
fix permissions:
sudo chown -R $(whoami):$(whoami) /poolname /e /d
gracefully unmount the pools (i think necessary or poor will be marked as suspended and compute restart needed):
sudo zpool export -a
mount the pools:
sudo zpool import -a
(if it fails, you have to mount manually, list disk names (ls -l /dev/disk/by-id/), then: sudo zpool import -a -d /dev/disk/by-id/yourdisk1name-part1 -d /dev/disk/by-id/yourdisk2name-part1 )
If some pool is encrypted, then additional command needed (-l parameter to enter passphrase, else it complains "encryption key not loaded"):
sudo zfs mount -a -l
pool activity statistics:
zpool iostat -vlq
zpool iostat -r (request size histograms)
zpool iostat -w (wait/latency histograms)
intent log statistics:
cat /proc/spl/kstat/zfs/zil
change mountpoint of some dataset within the pool:
sudo mkdir /new;sudo zfs set mountpoint=/new poolname/datasetname
rename/move dataset (error "cannot destroy filesystem has children"):
sudo zfs rename poolname/dataset/subdataset poolname/subdatasetnew
attach new drive (if the existing one is non-redundant single drive, result will be mirror (something like RAID1, with enhanced read/write and 1 drive fault tollerance, data self healing), if existing is part of mirror, it will be three way mirror:
zpool attach poolname existingdrive newdrive
Detach, remove, replace, see manual page (man zpool) or https://zfsonlinux.org/manpages/0.8.1/man8/zpool.8.html
create snapshot:
zfs snapshot -r poolname@snapshot1
List snapshots:
zfs list -t snapshot
Destroy (delete) all snapshots (no prompt) - if manually syncing pools using zfs send/receive or syncoid, this will break its function:
defined pool: sudo zfs list -H -o name -t snapshot -r POOLNAME|sudo xargs -n1 zfs destroy
all pools: sudo zfs list -H -o name -t snapshot|xargs -n1 sudo zfs destroy
destroy (delete) dataset (no prompt):
sudo zfs destroy poolname/enc
destroy (delete) whole pool (no prompt):
sudo zpool destroy poolname
========
If you are OK with the HDD activity to increase at times regular activity is no/low, then consider enabling automatic scrubbing (kind of runtime "fsck" that checks files and even can repair files on replicated devices (mirror/raidz)). Following sets the monthly task:
1. su;echo -e "[Unit]\nDescription=Monthly zpool scrub on %i\n\n[Timer]\nOnCalendar=monthly\nAccuracySec=1h\nPersistent=t rue\n\n[Install]\nWantedBy=multi-user.target" > /etc/systemd/system/[email protected]
2. echo -e "[Unit]\nDescription=zpool scrub on %i\n\n[Service]\nNice=19\nIOSchedulingClass=idle\nKillSignal=SIGI NT\nExecStart=/usr/bin/zpool scrub %i\n\n[Install]\nWantedBy=multi-user.target" > /etc/systemd/system/[email protected];exit
3. systemctl enable [email protected]
========
Another page worth reading: https://wiki.archlinux.org/index.php/ZFS
Terminology:
ZIL - ZFS intent log is allocated from blocks within the main pool. However, it might be possible to get better sequential write performance using separate intent log devices (SLOG) such as NVRAM.
SLOG - It's just a really fast place/device to store the ZIL (Zfs Intent Log). Most systems do not write anything close to 4GB to ZIL. (cat /proc/spl/kstat/zfs/zil). ZFS will not benefit from more SLOG storage than the maximum ARC size. That is half of system memory on Linux by default. SLOG device can only increase throughput and decrease latency in a workload with many sync writes.
ARC - Adaptive Replacement Cache is the ZFS read cache in the main memory (DRAM).
L2ARC - Second Level Adaptive Replacement Cache is used to store read cache data outside of the main memory. ... use read-optimized SSDs (no need to mirror/fault tolerance)
Cache - These devices (typically a SSD) are managed by L2ARC to provide an additional layer of caching between main memory and disk. For read-heavy workloads, where the working set size is much larger than what can be cached in main memory, using cache devices allow much more of this working set to be served from low latency media. Using cache devices provides the greatest performance improvement for random read-workloads of mostly static content. (zpool add POOLNAME cache DEVICENAME)
Interesting utilities:
ZREP is a ZFS based replication and failover script https://github.com/bolthole/zrep
Syncoid facilitates the asynchronous incremental replication of ZFS filesystems https://github.com/jimsalterjrs/sanoid#syncoid
Syncoid sync command looked like this in my case:
syncoid -r --skip-parent --no-stream poolsource pooldestination
# -r --recursive This will also transfer child datasets.
# --skip-parent Skips syncing of the parent dataset. Does nothing without '--recursive' option. Syncs only datasets contents.
# --*****naps ***** a list of snapshots during the run. Shows overview of snapshots.
# --no-stream Replicates using newest snapshot instead of intermediates. Syncs without replicating the intermediate snapshots in between. And will retain only most recent syncoid snapshot.
========
ZFS zpool file statistics (file size, number of files):
/decko/
a) zpool iostat -r;zpool iostat -w
b)
find /decko/ -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'
1k: 8102
2k: 2938
4k: 2169
8k: 2102
16k: 2311
32k: 2986
64k: 2533
128k: 2164
256k: 2146
512k: 1692
1M: 2284
2M: 4512
4M: 7483
8M: 7890
16M: 4184
32M: 1911
64M: 484
128M: 1461
256M: 4911
512M: 2344
1G: 578
2G: 113
4G: 13
8G: 11
16G: 2
/ecko/ZN:
find /ecko/ZN/ -type f -print0 2>/dev/null| xargs -0 ls -l 2>/dev/null| awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'
1k: 403007
2k: 33644
4k: 48356
8k: 155711
16k: 62305
32k: 52709
64k: 47308
128k: 44223
256k: 35698
512k: 32049
1M: 34376
2M: 22291
4M: 38327
8M: 8134
16M: 2448
32M: 1346
64M: 1948
128M: 1438
256M: 379
512M: 276
1G: 124
2G: 3