Support IP handover in rtpproxy for VoIP applications

Issue #304

If you do VoIP applications, especially with open sources like pjsip, you may encounter kamalio and rtpproxy to serve SIP requests. Due to limitation of NAT traversals, rtpproxy is needed to work around NAT. All SIP handshake requests go through a proxy server, but rtpproxy can also relay voice, video or any RTP stream of data. When I played with rtpproxy, it was before version 2.0 and I need to handle IP handover. This refers to the scenario when user switches between different network, for example from Wifi to 4G and they get new IP. Normally this means ending in the SIP call, but the expectation is that we can retry and continue the call if possible for users.

That’s why I forked rtpproxy and add IP handover support. You can check the GitHub repo at rtpproxy.

Use src_cnt to track the number of consecutive packets from different address. When this number exceeds THRESHOLD (10 for RTP and 2 for RTCP), I switch to this new address

This way

  • Client can ALWAYS change IP when he switches from 3G to Wifi, or from this Wifi hotspot to another

  • There’s no chance for attack, unless attacker sends > 10 (RTP THRESHOLD) packets in 20ms (supposed my client sends packets every 20ms)

This idea is borrowed from

There is a macro PJMEDIA_RTP_NAT_PROBATION_CNT. Basically, it is

“See if source address of RTP packet is different than the configured address, and switch RTP remote address to source packet address after several consecutive packets have been received.”

Mobile clients now change IP frequently, from these hotspots to those. So if rtpproxy can support this feature, it would be nicer.

Take a look at

// IP Handover Count how many consecutive different packets are received, 0 is for callee, 1 is for caller    unsigned int src_count[2];

And how it actions in

static void
rxmit_packets(struct cfg *cf, struct rtpp_session *sp, int ridx,
  double dtime)
    int ndrain, i, port;
    struct rtp_packet *packet = NULL;

/* Repeat since we may have several packets queued on the same socket */
    for (ndrain = 0; ndrain < 5; ndrain++) {
 if (packet != NULL)

packet = rtp_recv(sp->fds[ridx]);
 if (packet == NULL)
 packet->laddr = sp->laddr[ridx];
 packet->rport = sp->ports[ridx];
 packet->rtime = dtime;

i = 0;
 // IP Handover do not need canupdate
 // Use src_count
 if (sp->addr[ridx] != NULL) {
     /* Check that the packet is authentic, drop if it isn't */
     if (sp->asymmetric[ridx] == 0) {
  if (memcmp(sp->addr[ridx], &packet->raddr, packet->rlen) != 0) {
      if (sp->canupdate[ridx] == 0) {
   // Continue, since there could be good packets in
   // queue.

      // Signal that an address has to be updated
      rtpp_log_write(RTPP_LOG_ERR, cf->glog, "IP Handover Set i 1st ridx %d",ridx);
      i = 1;
  } else if (sp->canupdate[ridx] != 0 &&
    sp->last_update[ridx] != 0 &&
    dtime - sp->last_update[ridx] > UPDATE_WINDOW) 
      sp->canupdate[ridx] = 0;
      rtpp_log_write(RTPP_LOG_ERR, cf->glog, "IP Handover Set canupdate to 0 1st ridx %d",ridx);

  if (memcmp(sp->addr[ridx], &packet->raddr, packet->rlen) == 0) { 
   sp->src_count[ridx] = 0;
  else {
   // IP Handover RTCP packet sends at larger interval, so must use smaller THRESHOLD
   // Check to see if port is odd or even
   if(sp->ports[ridx] % 2 == 0) {
    if(sp->src_count[ridx] >= 10) {
     i = 1;
   else {
    if(sp->src_count[ridx] >= 2) {
     i = 1;


} else {
   * For asymmetric clients don't check
   * source port since it may be different.
  rtpp_log_write(RTPP_LOG_ERR, cf->glog, "IP Handover We are in asymmetric ridx %d",ridx);
  if (!ishostseq(sp->addr[ridx], sstosa(&packet->raddr)))
       * Continue, since there could be good packets in
       * queue.
 } else {
     sp->addr[ridx] = malloc(packet->rlen);
     if (sp->addr[ridx] == NULL) {
  rtpp_log_write(RTPP_LOG_ERR, sp->log,
    "can't allocate memory for remote address - "
    "removing session");
  remove_session(cf, GET_RTP(sp));
  /* Break, sp is invalid now */
     /* Signal that an address have to be updated. */
     rtpp_log_write(RTPP_LOG_ERR, cf->glog, "IP Handover Set i 2nd ridx %d",ridx); 
     i = 1;

  * Update recorded address if it's necessary. Set "untrusted address"
  * flag in the session state, so that possible future address updates
  * from that client won't get address changed immediately to some
  * bogus one.
 if (i != 0) {
     sp->untrusted_addr[ridx] = 1;
     memcpy(sp->addr[ridx], &packet->raddr, packet->rlen);

     // IP Handover Do not use canupdate
     // After update, reset src_count
     if (sp->prev_addr[ridx] == NULL || memcmp(sp->prev_addr[ridx],
       &packet->raddr, packet->rlen) != 0) 
         sp->canupdate[ridx] = 0;
  if(sp->prev_addr[ridx] == NULL)
     rtpp_log_write(RTPP_LOG_ERR, cf->glog, "IP Handover prev_addr NULL ridx %d",ridx); 
  rtpp_log_write(RTPP_LOG_ERR, cf->glog, "IP Handover Set canupdate to 0 2nd ridx %d",ridx);

sp->src_count[ridx] = 0;

port = ntohs(satosin(&packet->raddr)->sin_port);

rtpp_log_write(RTPP_LOG_INFO, sp->log,
       "%s's address filled in: %s:%d (%s)",
       (ridx == 0) ? "callee" : "caller",
       addr2char(sstosa(&packet->raddr)), port,
       (sp->rtp == NULL) ? "RTP" : "RTCP");

      * Check if we have updated RTP while RTCP is still
      * empty or contains address that differs from one we
      * used when updating RTP. Try to guess RTCP if so,
      * should be handy for non-NAT'ed clients, and some
      * NATed as well.
     if (sp->rtcp != NULL && (sp->rtcp->addr[ridx] == NULL ||
       !ishostseq(sp->rtcp->addr[ridx], sstosa(&packet->raddr)))) {
  if (sp->rtcp->addr[ridx] == NULL) {
      sp->rtcp->addr[ridx] = malloc(packet->rlen);
      if (sp->rtcp->addr[ridx] == NULL) {
   rtpp_log_write(RTPP_LOG_ERR, sp->log,
     "can't allocate memory for remote address - "
     "removing session");
   remove_session(cf, sp);
   /* Break, sp is invalid now */
  memcpy(sp->rtcp->addr[ridx], &packet->raddr, packet->rlen);
  satosin(sp->rtcp->addr[ridx])->sin_port = htons(port + 1);
  /* Use guessed value as the only true one for asymmetric clients */
  sp->rtcp->canupdate[ridx] = NOT(sp->rtcp->asymmetric[ridx]);
  rtpp_log_write(RTPP_LOG_INFO, sp->log, "guessing RTCP port "
    "for %s to be %d",
    (ridx == 0) ? "callee" : "caller", port + 1);

if (sp->resizers[ridx].output_nsamples > 0)
     rtp_resizer_enqueue(&sp->resizers[ridx], &packet);
 if (packet != NULL)
     send_packet(cf, sp, ridx, packet);

if (packet != NULL)

Here are some useful resources that I read

Learning VoIP, RTP and SIP (aka awesome pjsip)

Issue #284

Before working with Windows Phone and iOS, my life involved researching VoIP. That was to build a C library for voice over IP functionality for a very popular app, and that was how I got started in open source.

The library I was working with were Linphone and pjsip. I learn a lot of UDP and SIP protocol, how to build C library for consumption in iOS, Android and Windows Phone, how challenging it is to support C++ component and thread pool in Windows Phone 8, how to tweak entropy functionality in OpenSSL to make it compile in Windows Phone 8, how hard it was to debug C code with Android NDK. It was time when I needed to open Visual Studio, Xcode and Eclipse IDE at the same time, joined mailing list and followed gmane. Lots of good memories.

Today I find that those bookmarks I made are still available on Safari, so I think I should share here. I need to remove many articles because they are outdated or not available anymore. These are the resources that I actually read and used, not some random links. Hopefully you can find something useful.

This post focuses more about resources for pjsip on client and how to talk directly and with/without a proxy server.

First of all

Here are some of the articles and open sources made by me regarding VoIP, hope you find it useful

VoIP overview

Voice over Internet Protocol (also voice over IP, VoIP or IP telephony) is a methodology and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet

  • Voice over IP Overview: introduction to VoIP concepts, H.323 and SIP protocol

  • Voice over Internet Protocol the wikipedia article contains very foundation knowledge

  • Open Source VOIP Software: this is a must read. Lots of foundation articles about client and server functionalities, SIP, TURN, RTP, and many open sources framworks

  • VOIP call bandwidth: a very key factor in VoIP application is bandwidth consumption, it’s good to not going far beyond the accepted limit

  • Routers SIP ALG: this is the most annoying, because there is NAT and many types of NAT, also router with SIP ALG

  • SIP SIMPLE Client SDK: introduction to SIP core library, but it gives an overview of how


The Session Initiation Protocol (SIP) is a communications protocol for signaling and controlling multimedia communication sessions in applications of Internet telephony for voice and video calls, in private IP telephone systems, as well as in instant messaging over Internet Protocol (IP) networks.

SIP server

  • Kamailio: this is the server that I used, and it plays well with lots of standard SIP clients, including pjsip. Debugging on this server was also a fun story


RTP, SIP clients and server need to conform to some predefined protocols to meet standard and to be able to talk with each other. You need to read RFC a lot, besides you need to read some drafts.


NAT solves the problem with lack of IP, but it causes lots of problem for SIP applications, and for me as well 😂


Learn how TCP helps SIP in initiating session and to turn in TCP mode for package sending


Learn about Transport Layer Security and SSL, especially openSSL for how to secure SIP connection. The interesting thing is to read code in pjsip about how it uses openSSL to encrypt messages


Learn about Interactive Connectivity Establishment, another way to workaround NAT


Learn about Session Traversal Utilities for NAT and Traversal Using Relays around NAT, another way to workaround NAT


Learn about [Application Layer Gateway](http://Application Layer Gateway) and how it affects your SIP application. This component knows how to deal and modify your SIP message, so it might introduce unexpected behaviours.

Voice quality

Learn about voice quality, bandwidth and fixing delay in audio


This is a very common problem in VoIP, sometimes we hear voice from the other and also from us. Learn how echo is made, and how to effectively do echo cancellation

Dual Tone

Learn how to generate dual tone to make signal in telecommunication


PJSIP is a free and open source multimedia communication library written in C language implementing standard based protocols such as SIP, SDP, RTP, STUN, TURN, and ICE. It combines signaling protocol (SIP) with rich multimedia framework and NAT traversal functionality into high level API that is portable and suitable for almost any type of systems ranging from desktops, embedded systems, to mobile handsets.


pjsip uses Local Thread Storage which introduces very cool behaviors


How to work with sample rate of the media stream

Memory and Performance



I learn a lot regarding video capture, ffmpeg and color space, especially YUV


There are many SIP client for mobile and desktop, microSIP, Jitsi, Linphone, Doubango, … They all follow strictly SIP standard and may have their own SIP core, for example microSIP uses pjsip, Linphone uses liblinphone, …

Among that, I learn a lot from the Android client, CSipSimple, which offers very nice interface and have good functionalities. Unfortunately Google Code was closed, so I don’t know if the author has plan to do development on GitHub.

I also participated a lot on the Google forum for user and dev. Thanks for Regis, I learn a lot about open source and that made me interested in open source.

You can read What is a branded version

I don’t make any money from csipsimple at all. It’s a pure opensource and free as in speech project.
I develop it on my free time and just so that it benefit users.
That’s the reason why the project is released under GPL license terms. I advise you to read carefully the license (you’ll learn a lot of things on the spirit of the license and the project) :
To sump up, the spirit of the GPL is that users should be always allowed to see the source code of the software they use, to use it the way they want and to redistribute it.

RTP Proxy

Because of NAT or in case users want to talk via a proxy, then a RTP proxy is needed. RTPProxy follows standard and works well with Kamailio

IP change

IP change during call can cause problem, such as when user goes from Wifi to 4G mode


Learn about [Realtime transport control protocol](http://Real-time Transport Protocol) and how that works with RTP


To reduce payload size, we need to encode and decode the audio and video package. We usually use Speex and Opus. Also, it’s good to understand the .wav format

Building pjsip for Windows Phone 8

Windows Phone 8 introduces C++ component , changes in threading, VoIP and audio background mode. To do this I need to find another threadpool component and tweak openSSL a bit to make it compile on Windows Phone 8. I lost the source code so can’t upload the code to GitHub 😢. Also many links broke because Nokia was not here any more

Porting OpenSSL to Windows Phone 8

Firstly, learn how to compile, use OpenSSL. How to call it from pjsip, and how to make it compile in Visual Studio for Windows Phone 8. I also learn the important of Winsock, how to port a library. I struggled a lot with porting openSSL to Windows RT, then to Windows Phone 8

A lot of links were broken 😢 so I can’t paste them all here.

C and C++

Since pjsip, rtpproxy and kamailio are all C and C++ code. I needed to have a good understanding about them, especially pointer and memory handling. We also needed to learn about compile flags for debug and release builds, how to use Make, how to make static and dynamic libraries.

Jitter buffer in VoIP

Issue #157

This post was from long time ago when I did pjsip

A jitter buffer temporarily stores arriving packets in order to minimize delay variations. If packets arrive too late then they are discarded. A jitter buffer may be mis-configured and be either too large or too small.


If a jitter buffer is too small then an excessive number of packets may be discarded, which can lead to call quality degradation.

Lower settings cause less delay in the meeting, but meetings with lower settings are more susceptible to jitter effects caused by network congestion. Less data is buffered, increasing the likelihood that delayed or lost packets will produce a jitter effect in the media stream.

If a jitter buffer is too large then the additional delay can lead to conversational difficulty.

Higher settings are more effective at reducing jitter effects. With higher settings, more data is buffered, which allows more time for delayed packets to arrive at the client. However, higher settings also result in more delay (or latency) in the meeting. A user who is speaking will not be heard immediately by the other meeting participants. The delay in the meeting increases with the amount of time that data is held in the buffer.


A typical jitter buffer configuration is 30mS to 50mS in size. In the case of an adaptive jitter buffer then the maximum size may be set to 100-200mS. Note that if the jitter buffer size exceeds 100mS then the additional delay introduced can lead to conversational difficulty.




How to calculate packet size in VoIP

Issue #155

As you have probably observed in your studies, there is a determined method for calculating VoIP packet sizes. The packet size depends on many different variables, so there is no great answer for an “average” packet size – average depends on the environment. Just as an example, if you currently have VoIP running within a LAN and want to provision a new WAN so you can use VoIP to another site, knowing how big your VoIP packets are on the LAN won’t help. See below for a VoIP packet size calculation for a typical LAN, which will get you started.

Packet size

The general formula for VoIP packet size is this

Frame overhead + Encapsulation overhead + IP overhead + Voice payload.

Let’s say the packet is going across our LAN, so right now the frame overhead is 18 Bytes, for Ethernet II. (This size would change later if the packet crosses a trunk with 802.1Q tagging or ISL encapsulation, or is destined for the WAN, where a different link layer framing will probably be in use.)


Encapsulation overhead would include things like IPSec tunnels for security. Suppose we are not encapsulating this voice packet, so there is no overhead here.

“IP overhead” has overhead occurring at layer 3 and above, so for SIP phones this means IP (20 Bytes), UDP (8 Bytes), and RTP (12 Bytes). This is a total of 40 Bytes of IP overhead.

Lastly, you must calculate the size of the actual voice payload. Suppose we use the G.711 codec, which gives us a codec bandwidth of 64kbps. Also suppose our phones have a packetisation period of 20ms (meaning 20ms worth of voice goes into every packet). With these two numbers, we can figure out the size of the voice payload. Since one second of voice contains 64 kilobits of data (“64 kbps”), it is easy to calculate how many bits

Find the amount of Bytes per payload:

64000 bits * .02 seconds = 1280 bits of voice per payload  
1280 bits / 8 bits per byte = 160 Bytes of voice per payload

The total overhead is 58 Bytes (18 + 40)
The total VoIP packet size is 218 Bytes (160 + 58 )

In the interest of full disclosure, it is easy to get a bit rate per second from here; just convert 218 Bytes into bits and multiply by the packetization rate (which is the inverse of your packetization period, in this case 50 packets per second). The bit rate for ONE stream of this voice is 87.2kbps… we hope the user isn’t just talking to himself, so double that for an actual phone conversation.

There are lots of other little things, like VAD and various header compressions, that you may need to factor into these calculations as well. As you can see, any one of these many things being off will give you a different answer, so knowing how to go about the entire process is important.