Don't let roamers steal your trash
Published on: 19 Feb 2013
This one is for the IT boys in the dark basements. Love y’all :)
The problem
The results when using df and the sum of du are not the same. Example:
'Hey! I'm getting a 100% disk usage from df but I know I have free space, I've just rm-ed 10GB worth of files.'
The (probable) cause
The df results sync with the actual usage of hd space from time to time. Also, you might have processes that have that file still opened for some reason, even if it’s deleted. Most probably those processes are dead, undead, zombies, infected or whatever you might want to call them. I’ll call them ‘roamers’.
When the computer is rebooted, the roamers are killed along with any other process in the system, and the deleted files are finally purged, but let’s say you can’t afford to reboot the machine, i. e.: you’re working on a server.
What would Grimes do?
Use this command:
lsof +L1
It lists the opened files, and the +L1 filter will leave only those which are being used by just 0 or 1 processes. You might also want to add some grep magic to be sure that you are working only with files that are marked for deletion.
$lsof +L1 | grep "deleted"
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
roamer 38 rgrimes txt REG 14,2 51288 0 58754819 /blah/meh (deleted)``
On the left you get the process name and pid, so the only thing left to do is to go all kill -9
on it’s arse. Be careful though, and kill processes only when you’re sure you won’t break anything else or you don’t have any other exits available.