Bratty Redhead

the sarcasm is free!

Yak Hacking - a Story

I have a couple of half-written blog posts started this weekend.  Real wordy things about scrummifying your infrastructure and my experience with it.  I’ll probably get to that in a few days but I needed to vent about my weekend of yak shaving first.

Moral of my story: The story you’re about to read is not the worst yak shave ever, and the problem is not a hard problem.  It’s all in a day’s work for any halfway decent sysadmin.  What we’re seeing here is a small problem exacerbated by a combination of technical debt and inadequate tooling.

Technical debt is a choice and can happen to anyone.  Here the client allowed their configuration management to run away from them.  They haven’t been maintaining their Puppet nodes and so don’t have a good list of what servers they are managing.  They also let some config files slip through unmanaged.  I generally don’t point fingers about it as there’s usually a sane tradeoff involved, but the first issue makes fixing the second one harder.

However, inadequate tooling does frustrate me.  They’ve gone to the trouble of automating with Puppet and managing application configs with subversion and scripting, but do not seem to have considered holistic server management. The only way to perform administrative tasks is by hand on each server or with Puppet. This seems a rather gaping hole in long term planning.  100 servers is long past what I consider manageable by hand.  But you read the story and decide for yourself.

Situation
An org uses Puppet. The org has files unmanaged by Puppet that need to be gathered, analyzed and brought under Puppet control. I expected this to take a couple of hours.

What made it challenging:
Actual list of servers, uncertain. Generated from Puppet but unverified.

The files are secured from being read by anyone except root. 600 as it were.

There are no other management tools - no Rundeck, no Func, no Ansible, no Salt, no Knife, not even a home grown, lovingly maintained perl management script, no passwordless ssh.

The client apparently expects admins to log into boxes one by one for administrative activities.

What I have:
My user ID. 
Sudo on any server I can log into.
Passwordless SSH works even if it’s considered insecure.  (really?)
SSHPass
SSHSudo
Ruby 1.8.5 installed from RPM

With the thought of using sshsudo, I wrote a ruby script that runs on the client node, checks for file existence, copies the files to a temp folder, makes them available for scp and even tries to scp them back to the admin server.  The script should be runnable via sshsudo/sshpass.  BUT…

The servers also do not allow sudo without a tty. SSH -t doesn’t work.  SSH -t -t kind of works but hangs without the occasional keyboard intervention. Trying to scp from the script invokes another request for password which can’t be handled discreetly on the client.

SSHpass works but can’t execute anything with sudo on the other end because of the secure tty issue.  So this doesn’t do me any good because all the files are root-readable only.

SSHSudo works but suffers from similar issues.

The servers are RH5 running puppet installed with RPMs so no rubygem installs exist, even if I were rude enough to install things on servers belonging to someone else. I had been looking at the use of net-ssh/scp for some of my scripting, but it wasn’t really useful.

Did any of that make sense?

None of this addresses the pre-work I did either.  I was given a list of 200 or so servers (generated from Puppet I believe). with the caveat that “some may have been retired.”  So I wrote a tcp script to check listening sockets on port 22 and a few others if desired.  I then sorted my servers into ones that responded, ones that timed out and ones that issued a ‘name or service not known’ message and consulted with the client’s full time sysadmin. It turns out that the ones in DNS but not responding were retired and the ones with connection timeouts needed to be reached from another server. omg.

What I finally did:
Several unscripted actions on the command line because I was doing them as troubleshooting/discovery steps while figuring out wtf to do to get the files I needed.

Assembled a servers.good list based on the tcp testing.

Updated the sshsudo script to -t -t wherever it ssh’d (oi!)

make a local dir for the files, separated by host name:
for i in `cat servers.good`
  do echo $i
  mkdir -p /tmp/sascha/files/$i
done

Put the file manipulation script down on all the servers and run it; originally it was going to scp them back to the admin server too, but that wasn’t working out for me
./sshsudo -r -v -u sascha servers.good getfiles.rb

Get the files I snagged out of their root-read-only existence
for i in `cat servers.good`
  do echo $i
 sshpass -f ~/mypassfile scp -q sascha@$i:/tmp/files/* /tmp/sascha/files/$i
done

Delete the files from remote tmp
for i in `cat servers.good`
  do echo $i
  sshpass -f ~/mypassfile ssh -q sascha@$i rm -rf /tmp/files
done

I also spent some time making a temporary keyset for passwordless ssh but that turned out to be no real diff from using sshpass. But messing around with moving it to the servers highlighted another issue - I could only log into about 10 of 100.  Lovely.

OMG.  I need an orchestration tool or something, STAT.  I should probably go script this, but I will probably never do anything similar for this client again unless they ask me to implement an orchestration tool for them. 

All of this work, just to get the files to me so I can work work with them. How do people live like this???

****Lesson Learned: ask more questions when accepting work, even when it’s ad hoc, tiny project work for someone I know.  I assumed they would have management tooling.  After all, they were smart enough to use configuration management.