Rackspace Cloud Files download script

A new(er) tool in the services I use/recommend is Rackspace Cloud servers and Rackspace Cloud Files.

We were evaluating cloud services to host client websites, and I ended up choosing Rackspace’s cloud offerings. I really like the services the provide.

With their Cloud files, I can upload files that can be accessed anywhere. I decided that I wanted to put our common scripts there, that way when we provision a new server, behind a firewall or in the cloud, we can pull from the same place. All I would have to do is keep them up to date in one place.

Before I knew about Chef (future project I can’t wait to have time for), I created simple scripts to install a common set of packages on every server – our SOE (Standard Operatin Environment). Once a server is provisioned, from any other server, we can update the new server to have the same core set of packages and configurations. The most important part of this is that we install GIT and pulldown the python-cloudfiles:

yum install git -y
git clone git://github.com/rackspace/python-cloudfiles.git

Once python-cloudfiles is installed, we use the following script to pull down the common set of scripts:

conn = cloudfiles.get_connection('usename','keynumberthatisreallylong')
cont = conn.get_container(container)
obj = cont.get_objects(path=sourcepath)
for filename in obj:
	print "Downloading " + (os.path.join("/",container,sourcepath,os.path.basename(filename.name))) + " to " + destpath
	filename.save_to_filename(os.path.join(destpath, os.path.basename(filename.name)))
	destfile = os.path.join(destpath, os.path.basename(filename.name))
	timestamp = filename.last_modified[:filename.last_modified.find(".")-3].replace('-','').replace(':','').replace('T','')
	cmd = "touch -m -t " + timestamp + " " + destfile
	os.system(cmd)

What this does is pull down each file in a directory in the Cloud Files infrastructure and saves it locally. Then I added the extra step of setting the modified date to the Cloud Files last_modified date, so that we can tell what downloaded files have been changed recently (uploaded to Rackspace Cloud Files).

I look to replace this with Chef one day, but right now it works really well for us

,

2 Responses to Rackspace Cloud Files download script

  1. Delian Krustev June 27, 2012 at 8:58 am #

    Thanks for the code. It appears that RS cloud API has a limitation of 10000 objects returned. I’ve made some modification to overcome this limit and just want to share them back:

    #!/usr/bin/python -u

    import cloudfiles
    import os
    import sys

    container = ‘mediaContainer’
    sourcepath = ”
    last_file= None

    print “Opening connection”
    conn = cloudfiles.get_connection(‘X’, ‘Y’)

    print “Getting container: ” + container
    cont = conn.get_container(container)

    objects=None

    get_objects_call = 0
    while 1 :
    get_objects_call += 1
    print “Calling get objects. Callnum: ” + str( get_objects_call ) + “, marker: ” + str(last_file)
    objects = cont.get_objects( path=sourcepath, marker=last_file )

    found_objects = len(objects)
    print “Found objects: ” + str( found_objects )

    for filename in objects:
    last_file = filename.name
    if os.path.exists( filename.name ) :
    # print “Skipping: ” + filename.name
    continue

    print “Downloading ” + filename.name
    try:
    filename.save_to_filename( filename.name )
    timestamp = filename.last_modified[:filename.last_modified.find(“.”)-3].replace(‘-‘,”).replace(‘:’,”).replace(‘T’,”)
    cmd = “touch -m -t ” + timestamp + ” ” + filename.name
    os.system(cmd)
    except:
    print “Removing ” + filename.name
    os.remove( filename.name )
    sys.exit(1)

    if found_objects < 10000 :
    break

  2. jbmurphy June 27, 2012 at 9:02 am #

    Thanks! I did not know that. I haven’t had to deal with this code in a while or with that many objects. Thanks for taking the time to comment.