Puppet/Facter Question: How to determine if you are running puppet in chroot environment

Update: solved


Solution:

Thanks to Daniel Pittman from Puppetlabs, the solution is really easy.

export FACTER_chroot=whatever
chroot puppetd -vdt --waitforcert 60 -l /var/log/puppetrun.log

Eh voila, facter gives you the $chroot facter lib, and you can use it in your manifests.

Thanks again, Daniel, you made a happy puppet user even more happier :)



Dear Lazyweb,

Think about running puppet in chroot environment.

Your repices were written to trigger some resource deployment when a service resource (using sysV, upstart, systemd) says "Yes, I'm running".

But normally starting a service inside a chroot via upstart should fail, therefor your dependency won't be triggered.

Now, we have facter, and facter is a nice tool to actually determine if you are deploying on a real hardware, or a virtual machine (like on ESX or other solutions).

But, honestly, I didn't find any facter variable which tells me: "Yes, this is a chroot".

And right now, it's already late, I don't find a good solution how to determine, that I'm doing some work inside a chroot and not on the live system.

Dear Lazyweb, if you know a good solution (facter plugin, whatever) please leave a comment.

Thank you in advance.

New Year, New Company ;)

A new era starts for me.

Since yesterday (2011-02-17) I'm not working for my old company anymore.

As some of you heard the news, a global, worldwide SaaS company bought my old employer Netviewer AG.

Therefore, many of my colleagues had/have/will change the company, and I had to decide if I do the same.

It took me quite some days and hours to think about this step, but finally I decided to go the way of my colleagues.

Therefore I signed yesterday a new contract with the new company and as well signed the cancellation agreement with my old company.

From today I'm a happy employee of Citrix Online, Germany.

And as I'm working now for a, well US controlled company, I have to state here, that everything what I write on my private blog is my own opinion and doesn't, in any way, represent the opinion of the company I'm working for.

Let's see what'll happen. The future awaits me.

sudo over ssh magic

Imagine,

you have a datacenter full of Ubuntu Servers. 

Imagine,

you are the guy with sudo rights.

Imagine,

you need to run a command on all those servers, 
but this command needs to run with superuser privileges.

Imagine,

you didn't tweak your /etc/sudoers to allow 
this command to run without a password.

Imagine,

you try this: ssh $host sudo command_to_run

Realise,

this will ask you always for your sudo password
and it is echoing your password to your output device

But,

there is hope!

Find,

ssh -t -t -t $host sudo -S command <<EOF
<enter your password here>
EOF


Preferences for this to work:
  1. ssh authentication via public key without a passphrase (you have an account for such purposes with a holy secret ssh key without a passphrase)
  2. you are sitting alone in front of your workstation to enter your sudo password without anyone seeing it.
Explanation:

  1. ssh $host sudo command
    will echo the sudo  password back to your terminal, this is nothing you want
  2. ssh -t forces the allocation of a pseudo-tty (read ssh(1) )
  3. ssh -t -t -t forces the allocation of a tty allocation, even if ssh has no local tty (read ssh(1) )
  4. sudo -S causes sudo to read the password from stdin instead of the terminal device
  5. ssh -t -t -t $host  in combination with sudo -S <command> <<EOF\nyour password\nEOF\n
    is what you really need, to execute a sudo command on a remote host over ssh.


Conclusion:

You have a file with a list of IPs or hostnames for remote hosts you need to do something on with sudo.
A little script like the following will help you here:


#!/bin/bash

for i in cat ip.lst ; do 
     ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -t -t -t ${i} "sudo -S command <<EOF
<your password>
EOF
"
done

Ubuntu 10.04 LTS + Portchannel Bonds + Active-Passive Bonds

Update 3: It has nothing to do with Upstart, I'm sure about it now, after spending 4 hours of debugging.

Oh hell, I wonder why I'm always running into strange situations regaring Ubuntu Server, Network and Upstart (I hope it's upstart ;))

Ok, here we go with the setup:

Imagine you have a server with several ethernet interfaces.

eth0, eth1, eth2, eth3

Now imagine further, that eth0 and eth1 will be bonded as portchannel with LACP (bond-mode 4). Forget the xmit_hash_policy right now (this will be layer3+4, but this is not important right now).

Having Lucid and Upstart in place, the config looks like this:


auto bond0
iface bond0 inet static
   address 192.168.1.10
   netmask 255.255.255.0
   bond-slaves none
   bond-mode 4
   bond-miimon 100

auto bond1
iface bond1 inet static
   address 192.168.1.11
   netmask 255.255.255.0
   bond-slaves none
   bond-mode 4
   bond-miimon 100

auto eth0
iface eth0 inet manual
    bond-master bond0
    bond-primary eth0 eth1

auto eth1
iface eth1 inet manual
    bond-master bond0
    bond-primary eth0 eth1

auto eth2
iface eth2 inet manual
    bond-master bond1
    bond-primary eth2 eth3

auto eth3
iface eth3 inet manual
   bond-master bond1
   bond-primary eth2 eth3

The machine comes up, and I can ping the default interfaces just fine.
So, this setup is correct, the access vlans on the Cisco switch are set correctly, and the etherchannel config on the Cisco switch is also correct. There we go.

Now I want to have over the two portchannel bonds an active-passive bond. So, I'm going to change the config like this:


auto bond0
iface bond0 inet static
   address 0.0.0.1
   netmask 255.255.255.255
   bond-slaves none
   bond-mode 4
   bond-miimon 100

auto bond1
iface bond1 inet static
   address 0.0.0.2
   netmask 255.255.255.255
   bond-slaves none
   bond-mode 4
   bond-miimon 100

auto bond2
iface bond2 inet static
address 192.168.1.10
netmask 255.255.255.0
bond-slaves bond0 bond1
bond-mode 1
bond-miimon 100

auto eth0
iface eth0 inet manual
    bond-master bond0
    bond-primary eth0 eth1

auto eth1
iface eth1 inet manual
    bond-master bond0
    bond-primary eth0 eth1

auto eth2
iface eth2 inet manual
    bond-master bond1
    bond-primary eth2 eth3

auto eth3
iface eth3 inet manual
   bond-master bond1
   bond-primary eth2 eth3


On Ubuntu Jaunty, this setup worked out of the box (minus, that the manual eth* interfaces were not necessary, I had the bond-slaves directly configured on bond0 and bond1, but for Lucid it needs to be this way).

Ok, reboot the machine, comes up and no ping possible, but all interfaces are up and running.
Even bond2 is correctly enslaved with bond0 and bond1.

So, now I'm stucked. I think it has something to do with the setup of the NICs and bonds.

The way it should be:

  1. Upstart will start /etc/init/networking.conf on local-filesystems and stopped udevtrigger.
    This will bring up the bond interfaces bond0, bond1 and bond2
  2. Upstart will then bring up the hardware interfaces eth0, eth1, eth2 and eth3 and put them correctly as slaves to bond0 and bond1.
But what about interface bond2?
bond2 will come up with bond0 and bond1 as slaves, but bond0 and bond1 don't have their bond-slaves ready yet. So bond2 don't know anything about the needed hardware interfaces.

How can I tell upstart to wait for the hardware interfaces, before the virtual interfaces are started?
In other words, I need to defer /etc/init/networking.conf to be executed after the hardware interfaces are up and running.

If this would work somehow, I could even get rid of the unneeded eth0/eth1/eth2/eth3 manual configurations for the hardware NICs, and I'm able to go back to a more sane /etc/network/interfaces configuration.

Help is appreciated.

UPDATE: I uploaded an image of the setup which worked out of the box on Ubuntu Jaunty. So you can imagine what I'm trying to achieve.


UPDATE 2: Found another guy on the Novell forum which had the same problem. (http://forums.novell.com/novell-product-support-forums/suse-linux-enterprise-server-sles/sles-networking/398736-bond-bonds-bonding-2-aggregate-bonds-active-backup.html) but in 2009 that worked for me (Ubuntu Jaunty)

Serious Joke Of The Day: Drying your Smoked Sausages in your Datacenter

No Comments:

Fun with HP Support Europe

Disclaimer: We are happy customers of HP Hardware.

The HP hardware is great, no complains here, but what about the quality of the HP Support Department?

It's a disaster.

Here's the story:

In November 2010 we had a strange outage of one HP DL385 G5P Server, it suddenly froze during operations. As we were running a DRBD device on it, this was really a bugger. The active machine tried to sync the data over the network to the passive machine (which froze). As the data wasn't acked by the passive node (hence DRBD Mode C), the performance of the active DRBD node was going from 100 to 0.

So we halted the faulty machine and transported it from our datacenter  to our office, for testing.

As written earlier, we hadn't had the time to test this machine until yesterday.

So, installing Ubuntu 10.04 on the machine and "apt-get install stress".
Running the "stress" tool was fun, because the machine reacted as we expected, but after one hour of doing nothing, the machine suddenly halted, triggered by the HP ILO Managment sensors.

So, we tried to power on the machine again, but no. No Power. Machine started and stopped directly.
As we are really trained to guess hardware faults, we were sure, that it could only be the CPUs which could  have a hardware fault. So we called HP Support Desk (for what do we have 24/7/4 care pack?).
We described the problem and what we think the problem actually is, and the guys from the HP Europe Hardware Support desk send us a technician with a new systemboard, because they were thinking: "It's the systemboard, not the CPUs".

Here we go, as we are happy customers, we were believing that.

This morning, the technician came with a new systemboard and replaced it. What happened?
Nothing, the machine doesn't start, it gets power, and drops the power.
The technician was surprised, and checked again all parts and cabling. He also guessed, that the power supplies could be the problem, so we got some spare power supplies from our hardware pool and replaced them. No change. The machine starts and stops immediately.

Mr. HP Technician then "oracled" that the CPUs are at fault, because the first thing the machine does, after powerup, is to check the CPUs. When CPUs are failing, the machine powers down.

Now, the HP Technician then called the HP Hardware Support desk again, and tried to order a new systemboard (again), new CPUs and new Power Supplies, because he didn't know for sure, that the CPUs were broken.

During the phone call with his colleague, the guy on the other end of the line told him: It's not the CPU it's the SPS Backplane (that's the board which sits between the power supplies and the systemboard).

He was saying something about "I don't understand them, why don't they send what I want??" and went out of the office to another customer.

So, this afternoon the SPS Backplane arrived and the very same technician from this morning came back one our later then the Backplane.
He replaced the backplane, and what? Right, the machine doesn't power up properly.

So, back to the beginning. He called again, and now we are waiting for the CPUs (which are coming from Munich and Frankfurt as it seems).

Honestly, we are really experienced with HP hardware and we know, most of the times, what is the cause for a failure. Why can't HP Europe Support Desk just listen to their customers, especially when they work with their hardware. I mean, it's not the first time this happens, but during the last months the HP Europe Support became much more worse then ever.

I already talked about that matter with some people from HP Germany and with our distributor, but it seems nobody is interested.

I wonder how other customers from HP Europe / Germany are handled by the HP Europe Support. It can't be that when you have a 24/7/4 support contract et all that your machine will be  up and running in more then 24 hours.

I'm happy to receive some comments when you do have experience with HP Europe Support (good or bad), too.

Puppet Recipe: How to determine the role of a drbd device?

It's not perfect, but this little facter script helps to determine which role a drbd device has.

This is a puppet faceter plugin, you should put it somewhere under
/etc/puppet/modules/drbd/plugins/facter/drbd_role.rb

It checks which version of drbd you are running, the older DRBD setups had their config in /etc/drbd.conf, the newer versions especially on Ubuntu do have their resource config in /etc/drbd.d/.res


require 'facter'

filename=""
if File::exist?('/sbin/drbdadm')
if File::exist?("/etc/drbd.d")
old_drbd=false
else
old_drbd=true
end

if old_drbd==false
Dir.glob('/etc/drbd.d/.res') do | fileitem |
next if fileitem == '.' or fileitem == '..'
filename=File::basename(fileitem,'.res')
end
end

if old_drbd
role=%x{drbdadm role grep "resource" /etc/drbd.conf|awk '{print $2}'}.chomp.downcase
else
resource_name=%x{cat /etc/drbd.d/#{filename}.res|grep "resource"|awk '{print $2}'}.chomp.downcase
role=%x{drbdadm role #{resource_name}}.chomp.downcase
end

Facter.add("drbd_role") do
setcode do
role
end
end
end


Shell Goodies: Fetching NIC Interfaces with carrier without SED/AWK

As I'm rewriting some parts of the dhcp boot mechanism of live-boot, I needed the possibiility to fetch network interfaces, without the use of SED/AWK or whatever could help to parse the "ip -oneline link show" output.
As we somehow don't have sed or awk in our initramfs tools, I scribbled this:

#!/bin/sh
for device in /sys/class/net/* ; do
if [ -f "$device/carrier" ]; then
carrier=$(cat "$device/carrier" 2> /dev/null)
if [ "$carrier" = "1" ]; then
devicename=basename $device
if [ "$devicename" != "lo" ]; then
interface=cat $device/address
echo "$carrier of $devicename ($interface) is up"
fi
fi
fi
done

8 Months of Hard Work -> Success

First of all great news:

we are running now with round about 350 hosts on Ubuntu Lucid (10.04 LTS) Server Flavour on Bare Metal (HP Rackmounts DL360/DL365/DL380/DL385 from G5 via G5P, G6 and G7 , HP BladeServers BL465c G5 and G7 with the Flex10 Fabric) and VMWare Machines.

This was not the case until the last weekend.

In the past, we were running Ubuntu Jaunty (9.04) and that had to change, because 9.04 was EOL when Ubuntu Maverick.

Well, normally it would be easy to follow the non LTS releases with do-release-upgrade or apt-get update / apt-get dist-upgrade, but during our tests we found some really strange things.
We are running on Ubuntu many different services and some of them are involving DRBD setups. Especially this DRBD setup gave us problems.

First, in 10.04 LTS no Heartbeat1/2 was existing anymore, so we had to replace all our puppet recipes which are dealing with HA1/2 to pacemaker. This was one of the serious buggers
Second, while we were test-upgraing from 9.04 to 9.10 to 1.04 we found out that during this update all DRBD devices were horribly broken (we don't know why, but they were, and we had no time to investigate).

Therefore, we decided that we have to totally redeploy our Servers during Operational Times from Scratch.

What does this mean:


  1. Setup the whole infrastructure, or update the existing infrastructure to deploy Ubuntu 10.04
  2. Test Deploy VMWare Machines and Bare Metal Test Machines
  3. Test new hardware, especially the BL465C G7 blade servers from HP, because of the new Flex10 Fabric NIC
  4. Test Database Setups with Replications for our Production Services. From 5.0 to 5.1 many things changed. This was crucial for us, because some of our databases are running under high load (IO, CPU and Memory wise)
  5. Test many pacemaker setups, and write puppet recipes for them (pacemaker + ipvs + ldirectord, pacemaker+drbd+mysql, pacemaker+apache2, pacemaker + bind, pacemaker + postfix etc.)
  6. Test FAI Deployment of Bare Metal


Well, the problem with all that, we only had 8 months of time, without interrupting the daily operations.

Result: Many days with too many hours and a lot of brainfck involved.

At this time, when we started this adventure, we were 4 team members, and everybody got a share of the work.

My special topic was: Rewrite the FAIManager I wrote in 2008/2009. The result was DC².

I want to spare you the technical details of this adventure, but it was hard work. Especially when you get new hardware which was really untested, and you find problems during Network Boot Setups.
In the last 5 days, before the big bang started, I had to replace klibcs ipconfig network setup in live-initramfs overlay with udhcpc. This was a success, but it costed work time.

Anyhow, last weekend was the high time for us. We started on Saturday, around 10am (UTC+1) and after 36 hours we were finished.

All of our services are redundant. So, we deployed from scratch the second line of our machines. We tested the product on this second line and when we were sure, that everything worked, we switched from old Ubuntu 9.04 First Line Machines to the newly deployed Ubuntu 10.04 LTS line.
After the switch we re-checked the product services, so we were really sure that everything worked as before.
After the final test, we started to deploy the first line. Sunday evening we were then ready to bring up the  newly deployed machines as redundancy.

The last action on this sunday was to drink some beer and smoke a cigar to celebrate our success.

All in all, it was a success, everything worked as expected and the downtime was not more then 30 minutes.

Coming to an end, this project wouldn't have worked out without many people involved.

  1. All OPS team members involved. Without their energy to work day and night this wouldn't have worked out nicely.
  2. All people working for Ubuntu, Debian and especially my dear friends from the FAI project.
  3. A special thanks to Stéphane Graber and the people from the LTSP project, who had already UDHCPd in their initramfs setup, from where I got the idea and parts of the implementation.
  4. The people from the Puppetlabs for their great software, FAI + Puppet are great!
  5. The people from  the Qooxdoo Project, this is really a nifty piece of javascript framework
  6. The people from the Django Project, the backend application runs with it
  7. David Fischer for his great rpc4django project, really a cool implementation for xmlrpc and json-rpc
  8. The developer of Googles Chromium Browser, Mozilla Firefox and Firebug
  9. Hewlett-Packard for the great hardware

5 Years to retirement

Oh well, we all know IT Business is not for old people.

As it happened, I turned 40 today, and I'm already thinking about my future with >= 45. What to do?

Doing the Google Recruitment Cycle?

I don't think this is really what I want. In one of my last replies to their HR crew I wrote: "You need young people, and not old people" so no Google for me, really. I mean, it would be interesting and fun to work for Google, but not at my age anymore.

Applying for a job at Canonical?

Honestly, I really like the people working for Canonical, but I can't imagine sitting at home most of the time and doing my work from there.
I need a handful of good and trustworthy people around me when doing my job. Eye2Eye Communication is a must in my work life, discussing problems, finding solutions etc. this is what I like most working for a company. So no Canonical for me.

Having my own business?

Oh well, this would be fun, but without a product?
Hey wait, there is DC² and I think this project has potential. I could imagine that this would be fun, integrating Linux automation systems, to help other companies to maintain their data centers at large.
But there is the problem with the money. I don't have it to startup a company, and I don't think that I'll get it in the next 5 years. And I'm not a fan of business angels or venture capitalists.

But wait...I don't need that much money. I could leave Germany and do my job in another country, no not the US or UK or whatever developed country.
I'm thinking more of the African Continent. Let's see how it is in Cameroon, when I'm visiting my in-laws. Eventually that's a way to spend the last years 20 years of my life ( ;-) ) Helping there to build up good IT environments, educating young and smart Cameroon (and / or other African) IT Youngsters.

Let's see what the next 5 and more years have for me.

Anyways, I turned 40 and there is still a lot to do for me.

It's not that bad...Celebration starts now :)