Wednesday 13 October 2010

UGotFile Stabilizer HOW TO

This is for the future people at UGotFile. I'm going to be talking about the stabilizer, what it do, and how to improve or fix error.

UGotFile stabilizer is a tool to automatically evenly distribute files between hard drive with in the server and on the network. The stabilizer achieve it task via http, a not very secure channel,but in our case it ideal, http don't cost CPU power or extra ram and it aren't cap at 100 Mbps per connection unlike SSH.

Stabilizer are split into three part

  1. Gather accurate information about the server and auto index the server into the database.
    This part is pretty well done, it unlikely to cause any problem.
  2. Volume that are 5% bigger then the average volume are selected to mark some of their files for export.
  3. Volume that are smaller then the average volume are selected to download the files mark to be exported. Volume that are next to the exporting volume have higher priority.

Problems in paradise.
After the take over of UGotFile, all the hardware change, the special equipment we require cost a lot of money. We came up with some new ideas that increase the speed from each server from 200Mbps to beyond 1000Mbps. This is one step of the plan, to use each volume separates. This might shock some people but removing raid increase performance with concurrent connection. Some will ask how about the data integrity, well we can always check for a hard dive that is about to die and move their data away.

The problem is before the take over of ugotfile, we where doing some upgrade and it's half done. Then we made another upgrade. The volume have three different file structural and the database also have three tables. All three set still works nicely.

  1. All files are store in one volumes
  2. file are split using mod by the number of volumes.
  3. Each volumes have it own file structure, the volumes can be transfer between server and automatically work.
TODO: Remove initial, and second upgrade, be warned there are some system that is still using them.
if some one where to delete the old tables, one of the problem that will occurred is the server will run out of space with in a few month and crash.
TODO: Make some tool to clean up files that aren't register in the database for third upgrade. 

How to run the stabilizer:
/etc/init.d/admin.uGotFile.com.Stabilizers

Please do not run it from any where else, that script prevent some conflict and automatically launch the program if it crash. There is an inbuild timer for the stabilizer to stop every so often, just use /etc/init.d/admin.uGotFile.com.Stabilizers it better.

Problem that you may encounter 
The stabilizers are running and nothing is happening. There are multiple system running ugotfile.com and one of them is the automatic deletion of old files. The problem is that the system don't notify the stabilizers.
Quick Fix: Delete everything from the stabilizer database.
Solution 1: Fix the store procedure to clean up the stabilizer database.
Solution 2: Make a foreign key with cascade delete from table "file__volume" to table "stabilizer"

After doing the quick fix or applying the solution and it still do nothing.
Quick Fix: Run "find * -size 0c -exec rm -f '{}' \;" in each volume to delete files that are zero size in length, and replicate "/home/web/admin.ugotfile.com/AdminUGotFile/models/Stabilizers.php" from server 13 code over to the other server. Just add some checking to the code. 
Solution 1: Check http upload and ftp upload. For error they should both delete zero size files. FTP upload should be the problem.
Solution 2: Modified the stabilizer to ignore zero size files.


Again after all your hard work and it look like it's doing nothing. Then you have been fool by your on own negligence. You have let the volume fill up with out running the stabilizer and there are now corrupted files with zero size length, but registered XXX byte in the database. The problem will go away after 2 month.
Quick Fix: NONE
Solution 1: Use "find * -size 0c -exec rm -f '{}' \;" on the volume, mark the volume as "dying", run the stabilizer. Mart the volume as "enable" and truncate the stabilizer table. Before all of this you need to replicate /home/web/admin.ugotfile.com/AdminUGotFile/models/Stabilizers.php from server 13.
Solution 2: Change the display for the admin control to use the stabilizer statistics, it way more accurate.


If the solution are apply all the above will not happen again.

Good Luck Bob,
Cheers.

No comments:

Post a Comment