I've started poking at getting Kirkwood hardware crypto playing ball again with OpenSSH. Digging around I found two major showstoppers with the current versions:
- Privilege Separation
- Digests
On Priv Sep, the answer in the past was to disable it in OpenSSH, which seems wrong given how many safeties it provides. Testing with af_alg (It's in kernel, openssl supports it out of the crate) I got the normal fail when flipping it on, ssh'ing in would die before getting to a login prompt and log:
Ok, so that's a start. Did some digging, ended up adding the following to openssh's sandbox-seccomp-filter.c :
This satisfies the syscall firewall. I need to narrow the definition though, the domain and type are fixed for OpenSSL in this case, I just need to test narrowing the scope similar to what is done for NR_socketcall next.
On the digests issue, in my openssl.cnf instead of enabling all algorithms, I just did the ciphers, no hashes.
To preference hardware crypto I've set this in both sshd_config and ssh_config:
So far, this has me able to ssh in, I need to setup a box that advertises aes for me to verify ssh'ing out to. It's possible to use a custom openssl.cnf just for openssh, so digests can be hardware elsewhere, though based on benchmark data I've seen I don't know as they really do offer much improvement over software on this platform? There is also discussion that hardware hashes are broken in other ways in OpenSSL, so may be best to ignore them. Once I get the sandbox rule refined I'll start chasing with OpenSSH upstream to see if that portion can be merged into mainline.
Performance wise, af_alg is known to be slower than cryptodev, burning sys time for context switches/etc. Doing a 2gb pure random file scp transfer, hardware aes128-cbc I got 6.5MB/s, CPU walled on my GoFlex. Software I got 6.4MB/s, CPU walled? Cryptodev appears to be a dead end evolution wise for the OpenSSL team, not sure if that's worth chasing again or not.
Additional numbers:
aes256-cbc - 5.7MB/s software vs 5.7MB/s hardware
The gains may not be worth the chase using af_alg?
- Privilege Separation
- Digests
On Priv Sep, the answer in the past was to disable it in OpenSSH, which seems wrong given how many safeties it provides. Testing with af_alg (It's in kernel, openssl supports it out of the crate) I got the normal fail when flipping it on, ssh'ing in would die before getting to a login prompt and log:
sshd-session[18711]: fatal: ssh_sandbox_violation: unexpected system call (arch:0x40000028,syscall:281 @ 0x769fd18c) [preauth]
Ok, so that's a start. Did some digging, ended up adding the following to openssh's sandbox-seccomp-filter.c :
#endif
#ifdef __NR_socketcall
SC_ALLOW_ARG(__NR_socketcall, 0, SYS_SHUTDOWN),
SC_DENY(__NR_socketcall, EACCES),
#endif
/* Kurlon testing alfag */
#ifdef __NR_socket
SC_ALLOW(__NR_socket),
#endif
#if defined(__NR_ioctl) && defined(__s390__)
This satisfies the syscall firewall. I need to narrow the definition though, the domain and type are fixed for OpenSSL in this case, I just need to test narrowing the scope similar to what is done for NR_socketcall next.
On the digests issue, in my openssl.cnf instead of enabling all algorithms, I just did the ciphers, no hashes.
# Turn on AF_ALG hardware crypto openssl_conf = openssl_def [openssl_def] engines = openssl_engines [openssl_engines] afalg = af_alg_engine [af_alg_engine] #default_algorithms = ALL default_algorithms = =aes-128-cbc aes-192-cbc aes-256-cbc des-cbc des-ede3-cbc
To preference hardware crypto I've set this in both sshd_config and ssh_config:
Ciphers ^aes128-cbc
So far, this has me able to ssh in, I need to setup a box that advertises aes for me to verify ssh'ing out to. It's possible to use a custom openssl.cnf just for openssh, so digests can be hardware elsewhere, though based on benchmark data I've seen I don't know as they really do offer much improvement over software on this platform? There is also discussion that hardware hashes are broken in other ways in OpenSSL, so may be best to ignore them. Once I get the sandbox rule refined I'll start chasing with OpenSSH upstream to see if that portion can be merged into mainline.
Performance wise, af_alg is known to be slower than cryptodev, burning sys time for context switches/etc. Doing a 2gb pure random file scp transfer, hardware aes128-cbc I got 6.5MB/s, CPU walled on my GoFlex. Software I got 6.4MB/s, CPU walled? Cryptodev appears to be a dead end evolution wise for the OpenSSL team, not sure if that's worth chasing again or not.
Additional numbers:
aes256-cbc - 5.7MB/s software vs 5.7MB/s hardware
root@gfn:/etc/ssh# time openssl speed -evp aes-256-cbc -engine afalg -elapsed Engine "afalg" set. You have chosen to measure elapsed time instead of user CPU time. Doing AES-256-CBC ops for 3s on 16 size blocks: 19687 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 64 size blocks: 19478 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 256 size blocks: 19197 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 1024 size blocks: 16497 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 8192 size blocks: 7417 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 16384 size blocks: 4633 AES-256-CBC ops in 3.00s version: 3.3.2 built on: Sun Oct 27 14:19:50 2024 UTC options: bn(64,32) compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -fzero-call-used-regs=used-gpr -Wa,--noexecstack -g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/build/reproducible-path/openssl-3.3.2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DZSTD -DNDEBUG -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wdate-time -D_FORTIFY_SOURCE=2 CPUINFO: OPENSSL_armcap=0x0 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-256-CBC 105.00k 415.53k 1638.14k 5630.98k 20253.35k 25302.36k real 0m18.466s user 0m0.817s sys 0m10.254s root@gfn:/etc/ssh# time openssl speed -evp aes-256-cbc -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing AES-256-CBC ops for 3s on 16 size blocks: 1592995 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 64 size blocks: 549828 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 256 size blocks: 128594 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 1024 size blocks: 35895 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 8192 size blocks: 4758 AES-256-CBC ops in 3.00s Doing AES-256-CBC ops for 3s on 16384 size blocks: 2378 AES-256-CBC ops in 3.00s version: 3.3.2 built on: Sun Oct 27 14:19:50 2024 UTC options: bn(64,32) compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -fzero-call-used-regs=used-gpr -Wa,--noexecstack -g -O2 -Werror=implicit-function-declaration -ffile-prefix-map=/build/reproducible-path/openssl-3.3.2=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DZLIB -DZSTD -DNDEBUG -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wdate-time -D_FORTIFY_SOURCE=2 CPUINFO: OPENSSL_armcap=0x0 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-256-CBC 8495.97k 11729.66k 10973.35k 12252.16k 12992.51k 12987.05k real 0m18.098s user 0m17.217s sys 0m0.079s
The gains may not be worth the chase using af_alg?