LDAP sanity check

From V.S.V., Inc.
Jump to navigation Jump to search

I ran into an interesting issue recently. Our monitoring environment started posting multi-second response time for our LDAP servers. I knew this couldn’t possibly be the case because if it was, our whole environment would be be falling over in convulsions with code red alarms sounding throughout the building. So, off I went to question each group involved in a given transaction. In an enterprise environment this is an amazingly large number of people. Everyone from the monitoring group, to networking, to load balancers, security, and even the VM team. And, as expected, they all denied responsibility. So, I set out to find a way to independantly prove where the problem was, or at least to narrow down the suspects.

To do this, I wrote a small script that ran on the same Linux host that the monitor system ran its check from. (Note: this is not the monitoring host but, rather the monitoring proxy.)

while :; do date >> test-results; /usr/bin/time -f ‘Elapsed time: %E’ /usr/bin/ldapsearch -x “uid=bshaw” dn  >> test-results 2>&1; sleep 60; done

This script does a user lookup and records the date, the output, and the execution time to a log file every minute. Then I used a 2nd script to analyze the data to determine the average execution time. If monitoring was correct, the I should see very large average times or at the very least, I could view the file and see the results that corressponded to the time of the alert where there should be similar times. Here is the analysis script:

grep ‘Elapsed’ test-results |perl -e ‘$count = 0; while(<>){$line = $_; chomp($line); $count++; $time += substr($line, rindex($line, “:”)+1);} print “total time: $time\nrecords: $count\navg: “. $time/$count. “\n”;’

Either way of looking at the data showed no anomalies. My average time is 0.02 seconds. This is well within the industry accepted standard of 0.5 seconds. This meant the problem was with the monitoring environment itself.

While they still can’t give me my 0.5 seconds, they were able to get the execution time below 5 seconds so it quits alarming. In the meantime, I have setup an independant monitor for LDAP that uses a completely different system.