
This time, I’ll be running the MegaComet tests as per test 3, with kernel logging enabled to see where I’m pushing the TCP stack too far, so that hopefully i can fix it with some configuration.


As per test 3: start 5 EC2 servers ‘ami-221fec4b’. Out of curiosity, I priced it this time. Since my tests will take less than an hour, it’ll cost $0.34 (ec2 large instance hourly cost) * 5 instances = $1.70 to run this test. I can handle that. It’s also probably worth mentioning that in ec2, i configure the firewall to allow all the MegaComet ports open. In the real world, you’d have the ‘application’ port restricted.

Setup script

echo Configuring TCP stack
sudo bash
echo "# Settings from" >> /etc/sysctl.conf
echo "# Config needed to have enough tcp stack memory:" >> /etc/sysctl.conf
echo "net.core.rmem_max = 33554432" >> /etc/sysctl.conf
echo "net.core.wmem_max = 33554432" >> /etc/sysctl.conf
echo "net.ipv4.tcp_rmem = 4096 16384 33554432" >> /etc/sysctl.conf
echo "net.ipv4.tcp_wmem = 4096 16384 33554432" >> /etc/sysctl.conf
echo "net.ipv4.tcp_mem = 786432 1048576 26777216" >> /etc/sysctl.conf
echo "net.ipv4.tcp_max_tw_buckets = 360000" >> /etc/sysctl.conf
echo "net.core.netdev_max_backlog = 2500" >> /etc/sysctl.conf
echo "vm.min_free_kbytes = 65536" >> /etc/sysctl.conf
echo "vm.swappiness = 0" >> /etc/sysctl.conf
echo "# This is for the outgoing connections max:" >> /etc/sysctl.conf
echo "net.ipv4.ip_local_port_range = 1024 65535" >> /etc/sysctl.conf
echo "# I added this to set the system wide file max:" >> /etc/sysctl.conf
echo "fs.file-max = 1100000" >> /etc/sysctl.conf
echo "# Reduce the time sockets stay in time_wait:" >> /etc/sysctl.conf
echo "net.ipv4.tcp_fin_timeout = 12" >> /etc/sysctl.conf
sudo sysctl -p

echo Enlarging user-limits on files
sudo bash
echo "* soft nofile 1048576" >> /etc/security/limits.conf 
echo "* hard nofile 1048576" >> /etc/security/limits.conf

echo Enabling kernel logging
sudo bash
echo "kern.*          /var/log/kern.log" >> /etc/rsyslog.conf
sudo service rsyslog restart

echo Installing build essentials
sudo yum -y install gcc* git* make

echo Installing libev
cd ~
tar -zxvf libev-4.04.tar.gz
cd libev-4.04
./configure && make && sudo make install

echo Adding libev to the library list 
sudo sh -c "echo /usr/local/lib > /etc/"
sudo ldconfig

echo Installing MC
cd ~
git clone git://
cd MegaComet
cd testing

echo Now you have to logout and in again, because you only have a low per-user limit as you can see:
ulimit -n

Viewing kernel logs

Once MC started on the first instance, i run this to view the kernel logs:

sudo tail -f /var/log/kern.log

Starting tests:

On the test instances (2-5):

cd ~/MegaComet/testing
./megatest A


I can only get up to 494k connections. On the server, here is the top output when at maximum. As you can see, there’s plenty of ram free:

top - 11:56:56 up 36 min,  4 users,  load average: 0.00, 0.14, 0.20
Tasks:  84 total,   1 running,  83 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7652552k total,  2566036k used,  5086516k free,    20492k buffers
Swap:        0k total,        0k used,        0k free,   798960k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                   

 3373 ec2-user  20   0  8664  336  264 S  0.0  0.0   0:00.00 megamanager                                                                
 3374 ec2-user  20   0 27592  18m  460 S  0.0  0.2   0:15.90 megacomet                                                                  
 3375 ec2-user  20   0 26640  17m  460 S  0.0  0.2   0:15.40 megacomet                                                                  
 3376 ec2-user  20   0 26660  17m  460 S  0.0  0.2   0:15.29 megacomet                                                                  
 3377 ec2-user  20   0 26680  18m  460 S  0.0  0.2   0:15.27 megacomet                                                                  
 3378 ec2-user  20   0 27420  18m  460 S  0.0  0.2   0:16.59 megacomet                                                                  
 3379 ec2-user  20   0 27304  18m  460 S  0.0  0.2   0:15.81 megacomet                                                                  
 3380 ec2-user  20   0 26828  17m  460 S  0.0  0.2   0:15.52 megacomet                                                                  
 3381 ec2-user  20   0 27188  18m  460 S  0.0  0.2   0:15.60 megacomet

And the slabtop output:

Active / Total Objects (% used)    : 3713138 / 3713562 (100.0%)
 Active / Total Slabs (% used)      : 154792 / 154792 (100.0%)
 Active / Total Caches (% used)     : 58 / 74 (78.4%)
 Active / Total Size (% used)       : 1479546.63K / 1479691.42K (100.0%)
 Minimum / Average / Maximum Object : 0.01K / 0.40K / 8.00K

524160 524158  99%    0.19K  24960   21     99840K dentry
496797 496642  99%    0.19K  23657   21     94628K kmalloc-192
496000 496000 100%    0.06K   7750   64     31000K kmalloc-64
495328 495328 100%    0.12K  15479   32     61916K kmalloc-128
495040 495040 100%    0.07K   8840   56     35360K blkdev_ioc
495000 495000 100%    0.62K  41250   12    330000K sock_inode_cache
494950 494950 100%    1.62K  26050   19    833600K TCP
163410 163385  99%    0.10K   4190   39     16760K buffer_head

Nothing appeared in the kernel log. So, for now, i’m not sure what the holdup is: No kernel errors, didn’t hit a memory ceiling, i’m puzzled.

