At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga.
Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus.
Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.
Hey folks, just an FYI on what happened on the server this morning.
Looks like shortly before 10AM Eastern time, the server started to run out of memory. I'm still not entirely sure what caused the abnormal memory consumption, but I suspect one of the message queues on the server got backlogged. Either way, this log message appeared in the kernel logfile.
[7311113.986736] Out of memory: Kill process 11395 (beam.smp) score 227 or sacrifice child
[7311113.992553] Killed process 11395 (beam.smp) total-vm:5712704kB, anon-rss:5262008kB, file-rss:100kB
beam.smp is the process name for our message queue between the various web processes and actors that make OpenStudy tick: RabbitMQ. When Rabbit failed, everything else took a nosedive and the site pretty much went down.
Additionally, when we realized what was going on an tried to bring Rabbit back, we had difficulty getting it to start up correctly. Finally, we decided to blow away the data directory where it holds its queues and do some reconfiguration before bringing it back up for good.
TL;DR The server with 23GB of memory, ran out of memory and things got sticky. But it's all good now, and we're going to try and figure out what caused the memory failure.