Benchmark results for several implementations of the moving sum of squares (MSS):
- sequential, adapted from SleepECG
- ,
- ,
- ,
- parallel, but work-inefficient relative to the sequential algorithm
- ,
- ,
- ,
where n is signal length, k is window length and .
The source code:
The benchmarks were performed on an
Intel® Core™ i7-12700
with 16 GiB of RAM and
AMD Radeon™ RX 6600 XT.
Data type: float64.
- Futhark: 0.25.22. Backends:
C,
Multicore,
OpenCL (GPU).
- Julia: 1.10.5.
- Mojo: 24.4.
- Elixir: 1.16.2. Erlang: 26.2.5.
Backend: EXLA (CPU, JIT).
- C++: GCC 14.2.1, OpenMP: 4.5 (CPU), ArrayFire: 3.9.0 (OpenCL, GPU).
Mean runtime vs. signal length for window length = 32 samples#
Sequential#
Signal length
[samples] | Mojo | C++ | Futhark C | Julia | Elixir EXLA |
---|
2,048 | 1.7 μs | 1.7 μs | 2.0 μs | 2.2 μs | 22 μs |
---|
4,096 | 3.4 μs | 3.5 μs | 3.8 μs | 4.4 μs | 50 μs |
---|
8,192 | 6.9 μs | 6.9 μs | 7.4 μs | 8.7 μs | 69 μs |
---|
16,384 | 14 μs | 14 μs | 15 μs | 17 μs | 0.12 ms |
---|
32,768 | 27 μs | 27 μs | 29 μs | 34 μs | 0.29 ms |
---|
65,536 | 55 μs | 55 μs | 58 μs | 69 μs | 0.53 ms |
---|
131,072 | 0.11 ms | 0.11 ms | 0.12 ms | 0.14 ms | 1.0 ms |
---|
262,144 | 0.22 ms | 0.22 ms | 0.23 ms | 0.28 ms | 1.9 ms |
---|
524,288 | 0.44 ms | 0.44 ms | 0.46 ms | 0.59 ms | 3.9 ms |
---|
1,048,576 | 0.88 ms | 0.89 ms | 0.97 ms | 1.2 ms | 7.2 ms |
---|
2,097,152 | 1.8 ms | 1.9 ms | 2.0 ms | 2.4 ms | 15 ms |
---|
4,194,304 | 3.8 ms | 3.9 ms | 4.0 ms | 4.8 ms | 28 ms |
---|
8,388,608 | 7.8 ms | 7.8 ms | 7.9 ms | 9.5 ms | 56 ms |
---|
16,777,216 | 16 ms | 16 ms | 16 ms | 19 ms | 0.11 s |
---|
33,554,432 | 31 ms | 31 ms | 32 ms | 38 ms | 0.22 s |
---|
67,108,864 | 62 ms | 62 ms | 62 ms | 76 ms | - |
---|
Parallel (CPU)#
Signal length
[samples] | Julia | C++ OpenMP | Mojo | Futhark Multicore | Elixir EXLA |
---|
2,048 | 14 μs | 15 μs | 6.0 μs | 9.0 μs | 22 μs |
---|
4,096 | 16 μs | 16 μs | 7.3 μs | 13 μs | 27 μs |
---|
8,192 | 19 μs | 18 μs | 8.5 μs | 21 μs | 42 μs |
---|
16,384 | 18 μs | 22 μs | 13 μs | 35 μs | 68 μs |
---|
32,768 | 30 μs | 29 μs | 20 μs | 80 μs | 0.15 ms |
---|
65,536 | 39 μs | 43 μs | 38 μs | 0.12 ms | 0.28 ms |
---|
131,072 | 81 μs | 73 μs | 71 μs | 0.11 ms | 0.46 ms |
---|
262,144 | 0.13 ms | 0.13 ms | 0.14 ms | 0.21 ms | 0.79 ms |
---|
524,288 | 0.27 ms | 0.25 ms | 0.27 ms | 0.39 ms | 1.5 ms |
---|
1,048,576 | 0.66 ms | 0.51 ms | 0.54 ms | 0.78 ms | 3.1 ms |
---|
2,097,152 | 1.3 ms | 1.2 ms | 1.1 ms | 2.2 ms | 6.1 ms |
---|
4,194,304 | 3.0 ms | 3.4 ms | 4.1 ms | 5.5 ms | 12 ms |
---|
8,388,608 | 5.7 ms | 6.8 ms | 8.2 ms | 11 ms | 23 ms |
---|
16,777,216 | 11 ms | 14 ms | 16 ms | 22 ms | 43 ms |
---|
33,554,432 | 25 ms | 26 ms | 32 ms | 43 ms | 75 ms |
---|
67,108,864 | 45 ms | 51 ms | 65 ms | 84 ms | - |
---|
Parallel (GPU)#
Signal length
[samples] | Futhark (OpenCL) | C++ ArrayFire (OpenCL) |
---|
2,048 | 30 μs | 28 μs |
---|
4,096 | 30 μs | 28 μs |
---|
8,192 | 30 μs | 29 μs |
---|
16,384 | 32 μs | 31 μs |
---|
32,768 | 36 μs | 37 μs |
---|
65,536 | 44 μs | 44 μs |
---|
131,072 | 50 μs | 59 μs |
---|
262,144 | 73 μs | 88 μs |
---|
524,288 | 0.11 ms | 0.15 ms |
---|
1,048,576 | 0.19 ms | 0.27 ms |
---|
2,097,152 | 0.47 ms | 0.66 ms |
---|
4,194,304 | 1.0 ms | 1.3 ms |
---|
8,388,608 | 1.9 ms | 2.6 ms |
---|
16,777,216 | 3.9 ms | 4.9 ms |
---|
33,554,432 | 7.7 ms | 9.9 ms |
---|
67,108,864 | 15 ms | 19 ms |
---|
Mean runtime vs. window length for signal length = 50 × 10⁶ samples#
Sequential#
Window length
[samples] | C++ | Mojo | Futhark C | Julia | Elixir EXLA |
---|
1 | 46 ms | 47 ms | 47 ms | 56 ms | 0.17 s |
---|
2 | 46 ms | 47 ms | 47 ms | 57 ms | 0.24 s |
---|
4 | 47 ms | 46 ms | 46 ms | 57 ms | 0.34 s |
---|
8 | 46 ms | 46 ms | 47 ms | 57 ms | 0.32 s |
---|
16 | 46 ms | 46 ms | 47 ms | 57 ms | 0.32 s |
---|
32 | 46 ms | 47 ms | 47 ms | 56 ms | 0.35 s |
---|
64 | 46 ms | 46 ms | 46 ms | 57 ms | 0.35 s |
---|
128 | 46 ms | 46 ms | 46 ms | 57 ms | 0.40 s |
---|
256 | 46 ms | 46 ms | 46 ms | 56 ms | 0.64 s |
---|
512 | 46 ms | 46 ms | 47 ms | 57 ms | 0.97 s |
---|
1,024 | 46 ms | 46 ms | 47 ms | 57 ms | 1.7 s |
---|
2,048 | 46 ms | 47 ms | 47 ms | 57 ms | 2.9 s |
---|
Parallel (CPU)#
Window length
[samples] | Julia | C++ OpenMP | Mojo (CPU) | Elixir EXLA | Futhark Multicore |
---|
1 | 29 ms | 32 ms | 34 ms | 35 ms | 55 ms |
---|
2 | 29 ms | 33 ms | 36 ms | 77 ms | 57 ms |
---|
4 | 30 ms | 33 ms | 39 ms | 79 ms | 59 ms |
---|
8 | 33 ms | 34 ms | 37 ms | 88 ms | 60 ms |
---|
16 | 31 ms | 36 ms | 42 ms | 0.11 s | 60 ms |
---|
32 | 34 ms | 39 ms | 48 ms | 0.11 s | 64 ms |
---|
64 | 41 ms | 50 ms | 62 ms | 0.16 s | 0.10 s |
---|
128 | 53 ms | 76 ms | 98 ms | 0.20 s | 0.22 s |
---|
256 | 78 ms | 0.12 s | 0.17 s | 0.34 s | 0.44 s |
---|
512 | 0.17 s | 0.24 s | 0.32 s | 0.62 s | 0.88 s |
---|
1,024 | 0.26 s | 0.45 s | 0.61 s | 1.2 s | 1.7 s |
---|
2,048 | 0.51 s | 0.85 s | 1.2 s | 2.3 s | 3.5 s |
---|
Parallel (GPU)#
Window length
[samples] | C++ ArrayFire (OpenCL) | Futhark (OpenCL) |
---|
1 | 7.4 ms | 7.3 ms |
---|
2 | 7.6 ms | 7.4 ms |
---|
4 | 7.7 ms | 7.3 ms |
---|
8 | 8.1 ms | 7.3 ms |
---|
16 | 10.0 ms | 8.2 ms |
---|
32 | 14 ms | 11 ms |
---|
64 | 24 ms | 18 ms |
---|
128 | 43 ms | 30 ms |
---|
256 | 0.17 s | 55 ms |
---|
512 | 0.17 s | 0.10 s |
---|
1,024 | 0.17 s | 0.20 s |
---|
2,048 | 0.17 s | 0.40 s |
---|
Conciseness of the source code#
Implementation | Number of non-whitespace tokens | Compressed size (LZMA) [B] |
---|
Futhark sequential | 278 | 434 |
Futhark parallel | 109 | 266 |
Julia sequential | 208 | 472 |
Julia parallel (CPU) | 132 | 409 |
C++ sequential | 347 | 593 |
C++ parallel (CPU) | 254 | 629 |
C++ parallel (GPU) | 138 | 422 |
Elixir+Nx sequential | 431 | 713 |
Elixir+Nx parallel | 145 | 424 |
Mojo sequential | 353 | 565 |
Mojo parallel (CPU) | 249 | 540 |