One of my most recent open-source Ruby projects is a little project I called ‘RubyRetriever’. RubyRetriever is supposed to be an auto-downloader, it goes out like a spider to every URL you provide and it scans them for executables – and if it finds any it downloads them. The reason I wrote this little package is because I wanted a autodownloader to try and hunt down malware. You see I work for a security software company and do malware research for my job, so some of that interest pours over into my sider projects from time to time. In this case, it was that I wanted to write an autodownloader in my favorite language, Ruby.
Well – it works, to an extent. Since malware usually transmits via exploits however RubyRetriever is not able to locate those and thus cant download them. Things to work on moving forward I suppose!
The entire open source project scan be found here: https://github.com/joenorton/rubyretriever
There is lots of cool stuff going on in the package (imo anyway), which you are free to check out. I had to implement a patch for OPEN-URI to get around this weird HTTPS redirect bug they have at present. I also do the tinyest bit of file reading (to check if the downloadable file is actually an executable).
The real juicy stuff has to do with crawling the actual websites. Right now I wrote the script so it only goes 2 levels deep into URLs provided, eventually it would be nice to be able to set a variable for number of levels to drill as well as a cap on total pages. More todo I suppose!
The most essential part of the script, which actually took me a while to put together, is how to literally download a file in Ruby. Here is how you do that:
1 2 3 4 5 6 7 8 9 10 11 12 | def download_file(path) arr = path.split('/') shortname = arr.pop puts "Initiating Download to: #{'/samples/' + shortname}" File.open(shortname, "wb") do |saved_file| # the following "open" is provided by open-uri open(path) do |read_file| saved_file.write(read_file.read) end end puts " SUCCESS: Download Complete" end |
Any questions let me know! Feel free to work on RubyRetriever, it is licensed under GNU GPL so you can use it how you want but if you add extensions to the program please let me know so I can put them back into the core!
Enjoy!
Posted: July 19th, 2012 under Download with Ruby, Ruby Executables, Ruby Webspider.

