Kamailio benchmark results on Rasberry Pi platform
I always wanted to experiment a bit more with the ARM platform Rasberry Pi. The current running pre-release tests for the upcoming Kamailio 5.3.0 were a good opportunity to finally do it.
I used a bit older Rasberry Pi 3 Model B. This model has 4 × Cortex-A53 1.2 GHz cores and 1 GB of RAM. As Linux distribution I choose the standard Debian based Rasbian stable, which is (still) providing a 32 Bit environment.
This hardware was connected to a standard switch over Ethernet network cable to the test machine. The tested Kamailio version was version 5.3.0-pre1 from git master branch, git version ccc0eb6d. Compiler was the standard gcc v8.3 on Rasbian. The operating system was used without any running desktop environment, but no other optimizations (like disabling logging services etc..) were done.
The test runs were repeated several times to see if the results could be reproduced. That said, the performance of a small embedded system like the Rasberry Pi depends a lot on the temperature of the environment. I monitored the temperature of the CPU during the tests, several times it reached over 80 °C. This means that by providing e.g. active cooling for the Rasberry Pi CPU you could achieve better results, especially for longer tests. To detect the thermal effects I did shorter (120s) and longer (300s) tests. The shorter tests were much more affected by the thermal effect as the longer tests.
For the different call tests the standard sipp tool was used. I used an existing sipp configuration from Stefan Mititelu from github link. The test rate were increased until re transmissions were observed, and then slightly decreased again to reach a stable setup with only a few re-transmissions. In the Kamailio configuration only the number of children were reduced to 4 - the number of cores.
During the first compilation of Kamailio on the test platform I noticed many swp{b}
deprecation warnings from the compiler. These were created because of deprecated assembler instructions in the Kamailio core, used e.g. for low-level locking primitives. These primitives did also not provided support for multi-core compilation and were lacking ARM v7 support. Furthermore I noticed that the existing Kamailio build files were not prepared for newer (> v6) ARM architectures.
After several fixes and extensions all this topics should be now much improved. The make files were extended to properly detect ARM v6 - ARM v8 architectures. The current build system will compile for ARM v5, ARM v6 and ARM v7 architecture. It will fallback for ARM v8 to ARM v7 for now.
By default the current Rasbian gcc will compile for ARM v6. If you want to override this you can add -march=native
in the following section in the Makefile.defs file. Then you will get ARM v7 (have a look to the output of make cfg).
# to build with native architecture, e.g. for rasberry pi: add -march=native
# armv8 not supported yet, fallback to armv7
predef_macros:=$(shell $(CC) -dM -E -x c $(CC_EXTRA_OPTS) $(extra_defs) \
$(CFLAGS) /dev/null)
You can get the architecture for which Kamailio was compiled for also now with kamailio -I.
I have tested several different build optimizations to see if they have a noticeable effect for my benchmarks.
Higher optimization or more iterations were not done, as compilation on the Rasberry Pi is much slower in higher optimization levels with gcc.
I observed a slightly better performance with the new locking code for ARM v6, and also a bit better performance in some test cases for the ARMv 7 code. But as the differences were only minor, I choose for my tests the default architecture ARM v6, new locking, standard optimization.
In this test case the standard Kamailio configuration link was used to provide a registration server. The Kamailio was configured to use only in-memory storage for the registration, no authentication were done as well. This test was used to get a indication about a best-case scenario for Kamailio network throughput.
The results were pretty impressive. I was able to get about 5000 REGISTER requests per seconds for a short test, and 3000 REGISTER requests per second for the longer test. The output from the longer sipp test run is show below.
Call-rate(length) Port Total-time Total-calls Remote-host
6.0(3000 ms)/0.002s 2222 300.78 s 902352 192.168.188.26:5060(UDP)
978 new calls during 0.326 s period 0 ms scheduler resolution
9 calls (limit 27000) Peak was 45 calls, after 266 s
0 Running, 99004 Paused, 1141 Woken up
0 dead call msg (discarded) 0 out-of-call msg (discarded)
3 open sockets
Messages Retrans Timeout Unexpected-Msg
[ NOP ]
REGISTER ----------> 902347 0 0
200 <---------- 902343 0 0 0
[ NOP ]
------------------------------ Test Terminated --------------------------------
----------------------------- Statistics Screen ------- [1-9]: Change Screen --
Start Time | 2019-09-25 17:19:49:838 1569424789.838010
Last Reset Time | 2019-09-25 17:24:50:298 1569425090.298358
Current Time | 2019-09-25 17:24:50:624 1569425090.624483
-------------------------+---------------------------+--------------------------
Counter Name | Periodic value | Cumulative value
-------------------------+---------------------------+--------------------------
Elapsed Time | 00:00:00:326 | 00:05:00:786
Call Rate | 3000.000 cps | 2999.980 cps
-------------------------+---------------------------+--------------------------
Incoming call created | 0 | 0
OutGoing call created | 978 | 902352
Total Call created | | 902352
Current Call | 9 |
-------------------------+---------------------------+--------------------------
Successful call | 977 | 902343
Failed call | 0 | 0
-------------------------+---------------------------+--------------------------
Call Length | 00:00:00:001 | 00:00:00:001
------------------------------ Test Terminated --------------------------------
At the peak the Rasberry was handling app. 18M MBit/s incoming and 16 MBit/s outgoing SIP traffic, which is pretty amazing.
In this test case the standard Kamailio configuration link was used to provide a proxy server. The uas and uac registered one time, and then calls were executed.
This test result shows a lower call per second result, as the executed logic in the Kamailio configuration is more complicated. Furthermore each call needs to be tracked by the tm module and the transactions stored in memory. But 300 calls per second are still an impressive number, and are something that many commercial setups of Kamailio don't see often during a day.
I tested two different scenarios, short (3s call duration) and longer (30s call duration). The results for the short call scenario were about 7% lower as for the longer call scenario. The output of the longer sipp test run is shown below.
Call-rate(length) Port Total-time Total-calls Remote-host
300.0(30000 ms)/1.000s 1111 301.05 s 90316 192.168.188.26:5060(UDP)
137 new calls during 0.456 s period 0 ms scheduler resolution
9003 calls (limit 27000) Peak was 9012 calls, after 131 s
1 Running, 18904 Paused, 274 Woken up
0 dead call msg (discarded) 0 out-of-call msg (discarded)
3 open sockets
Messages Retrans Timeout Unexpected-Msg
[ NOP ]
INVITE ----------> 90316 0 0
100 <---------- 90315 0 0 0
180 <---------- 64135 0 0 2
200 <---------- 90315 3 0 0
ACK ----------> 90315 3
Pause [ 30.0s] 90315 409
BYE ----------> 81315 0 0
200 <---------- 81313 0 0 0
ACK ----------> 0 0
----------------------------- Statistics Screen ------- [1-9]: Change Screen --
Start Time | 2019-09-25 18:20:02:666 1569428402.666952
Last Reset Time | 2019-09-25 18:25:03:271 1569428703.271712
Current Time | 2019-09-25 18:25:03:727 1569428703.727778
-------------------------+---------------------------+--------------------------
Counter Name | Periodic value | Cumulative value
-------------------------+---------------------------+--------------------------
Elapsed Time | 00:00:00:456 | 00:05:01:060
Call Rate | 300.439 cps | 299.993 cps
-------------------------+---------------------------+--------------------------
Incoming call created | 0 | 0
OutGoing call created | 137 | 90316
Total Call created | | 90316
Current Call | 9003 |
-------------------------+---------------------------+--------------------------
Successful call | 136 | 81313
Failed call | 0 | 0
-------------------------+---------------------------+--------------------------
Call Length | 00:00:30:007 | 00:00:30:008
------------------------------ Test Terminated --------------------------------
As the peak the Rasberry was handling app. 9 MBit/s incoming and 10 MBit/s outgoing SIP traffic, which is still impressive.
Overall I was impressed with the performance of Kamailio on this small and cheap hardware platform. Kamailio were running stable and processed millions of calls without any issues. I only noticed one crash (probably related to CPU over-heating) and about 10 error log messages (probably produced from the overload situation).
If you use Kamailio on another embedded ARM (>= v6) system I am interested in your feedback. After I got a new Rasberry Pi 4 I will probably try this benchmarks again.
If you are interested in Kamailio performance optimization for your platform, please contact me here.