Automating Authoring of my Web Page
Ideally, I would like to just write something, decide where it fits on my website, and have the computer take care of all the grungy details of uploading to my isp and putting in headers and footers. I also like knowing as much as I can about who is looking at what on my site.
Here's what I've done to achieve some of this.
Uploading changed pages
As is described in Calling up the ISP, I have my machine set up so that crontab runs a script every hour which connects to the isp and downloads mail and news. This script also runs the program which uploads changed pages to the isp, upload_mirror.sh.
Since I find the documentation for the "mirror" program somewhat obscure, and it didn't work for me until someone sent me a copy of their mirror defaults file, here's the mirror_defaults file I use.
I used to use a makefile-based ftp to get the web pages uploaded, but this required that the makefile have a list of every link I used, and when my directory structure stopped being flat, that got tedious and error-prone. So what I do now is use a program called wget. It knows how to parse the url tree. That is, it doesn't need a list of what I think is on my site -- if I tell it to recursively upload my homepage, it uploads the page and everything it references. Unfortunately, unlike mirror and ncftpput, it only knows how to get files, not how to put them somewhere else.
I thought about running wget on my isp. However, while I have succeeded in having crontab run a script on world, I have so far been unsuccessful in having this script do anything that involves knowing the (dynamic) ip address of my home machine. So I use wget to mirror the tree to a 'mirror' area on my home machine, and mirror (which knows how to check date and time stamps and only copy files to the target if they are newer than the target version) to transfer the mirror up to the isp.
This is one of the things about my setup that is in fairly urgent need of improvement, as it currently does not recover very gracefully from some error conditions like having the connection go down, or being over quota on the isp. Also, deletion of files isn't as automatic as it should be. But it's better than nothing.
Actual writing
I use xemacs as my editor, and I write web pages with the HTML mode that comes with that. I found it a little confusing at first, since some of the more common things (e.g., inserting an href) aren't on the menus (href is ^Z h), and the menus offer you only legal choices, so if you have something not closed, you suddenly can't insert a paragraph. Which is good when you get used to it, but makes it hard to figure out. Some of the nice people on the RedHat mailing list gave me some good advice.
One of the things that made me say "this needs some tools" when I was first learning html was the number of things they tell you have to be on each page.
- Sign it with your name.
- Put the date last modified.
- Have some navigation aids, at least a return to your home page.
- Put in a header that says what kind of html you're writing, so you can use a validity checking tool.
- Put the whole url at the bottom, so if someone prints it off, they can tell where it came from.
- I haven't even started dealing with copywriting.
I initially wrote a tool to take a pretty raw page and add all the headers and footers. Then when I started using the xemacs mode, I found that it put some but not all of these things in automatically, so my tool now checks for a string that xemacs puts in, and doesn't add things that are already there.
My makefile needs to know how to run the script, and to have a list of the htmls. (I write filename.htm, and my script adds the headers and footers to create filename.html.) So I haven't completely eliminated the need to maintain lists by hand, yet. Each HTML file is listed in the Makefile. The .htm.html method knows how to put the headers in.
Here's what the Makefile looks like, with some ancient history taken out.
And here's the signpage.sh script.
Statistics
One of the things I really like about web authoring versus the dead tree kind is that you can get the web server to tell you when people actually look at what you do.
I have crontab set up so that once a day I get the log from the previous day emailed from my isp's machine, and gnus puts mail with this subject into a directory named wwwacct.
So every time someone accesses a page, I get a line like this about it:
193.130.180.102 - - [14/Jul/1998:10:49:20 -0400] "GET /~lconrad//jewel-index.html HTTP/1.0" 200 1006The script, countpages, which turns these lines into the Usage Table page on my website, is a little too baroque for my taste, but it works.
Now that I have a year's worth of data, the script with awk's and grep's and sort's is takign a while to run, and in any case "How many people have ever looked at this page?" isn't the only question you want to ask this data. So my plan is to put the log data into MySQL. This is on the to do list.
Still working on -- any suggestions?
- Automatically generate the makefile?
- Generate a site map.
- Modify the html template and commands in xemacs so that the signpage.sh script isn't necessary, and it deals with the way I insert images with thumbnails.
- Automate what I do about scanning in a photo and creating a thumbnail.
webmaster@laymusic.org Last modified: Tue Dec 28 06:37:25 EST 1999
Last modified: 2002-06-29 11:12, 2007
www.laymusic.org/webauth.html
