NVidia has been developing a set of GPU-accelerated software packages under the RAPIDS brand. Most of them are targeted at graphics accelerators that I don't have at home in my home menagerie of single board arm64 systems. The interesting one that is in scope is cuSignal, a signal-processing tool that supports the Maxwell cores that my Jetson Nano has (128 of them).
cuSignal is derived from SciPy Signal, and seeks to improve on its performance by offloading some of the matrix handling necessary onto the GPU.
The most exciting bit of the work to date has been development using cuSignal to handle incoming radio signals from an RTL-SDR or even a wideband SDR. Luigi PU2SPY has demonstrated demodulating 18 radios simultaneously using a LimeSDR and a GTX 1070 Ti graphics engine. Luigi's brief demo was posted to Twitter in late December 2019, with subsequent plans announced to upstream and release related code at a later date.
In the meantime, if you have a Jetson Nano or any better NVidia GPU, you can follow the cuSignal install instructions to bring up the underlying routines and test them for correctness. The Jetson specific doc is much improved from the original release, and it steers you towards using the conda4aarch64 arm64 version of conda for package handling.
More to come here - if you are comfortable with building signal processing code from scratch in Python you are all set, and some of the rest of us are ready for radio focused tools to be released especially to see just how capable the Jetson Nano might be for amateur radio advanced activities.