HOW I AM DOING ALL THIS
Back in 1986, when I had started to run my textfile BBS, adding
new textfiles was a relatively painless process. I just added new
entries to the descriptions file, uploaded the textfile, and ran
a little utility that came with PC-BOARD. Everything was handled
for me; I didn't have to worry or care about the process involved.
All that I had to do was make sure the same textfile didn't show
up several times across the directories.
Of course, times have changed; with my taking on the work of
textfiles.com, I decided that maybe it was time for me
to do some work at writing shell scripts (and later perl
scripts) to do the work, and do it right. This project is ongoing,
but already the general framework is in place.
Much like the rings in J.R.R. Tolkein's books, I have a small
toolbox of scripts that do a lot of the work for me. Here are
what they are named and what they do:
- FERRET is the real workhorse of all the scripts. Originally in Bourne Shell but now in Perl, Ferret checks the current list of files (you can see this list in any given directory by looking for ".descs"), deletes the entries for files that have been moved or deleted, adds entries for files that have been added to the files, and generates a new index.html file for the web-browsing public. This script is run many times a day at this point, as I make minor or major changes to what files are in what directory.
- MOVERS allows me to make a list of files (by copying the .descs file to a temporary name and deleting what doesn't match) and move all those files (and their descriptions) to the new directory. If there are already files with the same names in the destination directory, the files are not moved. This allows many hundreds of files to be moved quickly and cleanly.
- LIGHTNINGROUND is the program that forces me to look at all the files in the "in-box" directory and decide, right there, what to describe them
as or to delete them. By making me do all this on the command-line prompt and NOT on the web site, I can get through them at the rate of up to 1 a minute or more, and so a 1,000-file directory can be whittled down to 400-600 described files in an evening. Very nice.
- TARBALL goes through all the directories and creates a .tar.gz file
containing that entire directory. A lot of people really like to read
all the files, or save them, and I'm a big fan of giving them what they
want. If they want the whole shebang, it might as well be in one big file
instead of them hitting my poor server hundreds or thousands of times. In
this way, everyone involved is happy. TARBALL edits the footer of the
directories to accurately describe the size of the archives people can
download. In many casses, the compressed archive is one third the size of
what the full directory was.
These won't be the end of it, of course. There are at least two scripts I haven't written yet that will become more important as time goes on.
The first one will go through a directory and change the extensions on files and match the descriptions to the new name from the old. This will
get rid of these foolish ".doc" and ".txt" filenames that don't describe anything. The second one will go through the ENTIRE site and pick out
suspicious doubled files for me, files that are a little too close to each other not to be considered duplicates. This will be somewhat heuristic,
and I will spend some time making it work really well. Then we really WILL have the thousands we claim to have, and not have a bunch of doubles
like we do now.
And so the work continues....