While compiling the 2010 entries for the first time a few months ago, I wrote down all the steps I took so I could more quickly replicate the results with the full 2011 set a few days later. I figure this information might be of use to someone out there. Even if you're not looking to exactly replicate the process of taking Twitter Fiction Reader pages and compiling them into a Kindle end product, perhaps in the more general sense you could get some pointers for how to go about mass modification tasks.
I used EditPad for most of these text tasks, but having something that can work with regular expressions should be the most important thing.
1. Open all the files, let the search/replace operations run with the "All Files" option checked.
2. REMOVE the following (search for it and replace all matches with blank)
3. REMOVE
4. Using a regular expression (regex) search with "Dot Matches Newline" selected, replace the following
5. You could do the next few steps with the files still all open in separate tabs, but at some point you'll need to combine them. As long as the search/replaces are still going on across all files, it doesn't make a difference whether you do it now or whether the rest of these are complete. How to go about this? As a Windows guy, I first made sure all the files I wanted to combine were in a directory together, and their names were in proper alphanumeric order. In the case of DadBoner 2010 I had 2010-04-15.HTM, 2010-04-18.HTM, on and on to 2010-12-31.HTM. I then made a .bat file in the directory (specifically I called it mergem.bat, though that's unimportant) which had a single command: copy/b *.htm 2010.htm You could just type the same thing into the commandline, but it's often simpler to make something into a .bat file than to navigate to some crazy nested subdirectory through the command prompt. Anyway, that command makes it take all the existing *.htm files and combine them into the single 2010.htm in the order of their filenames.
6. REMOVE regex
7. REMOVE regex
8. REPLACE
9. REPLACE
10. REPLACE regex
11. REPLACE
At this point we have things very very close to what I put up as HTML versions, except that that includes a simple replacement header (since we chopped them off in step 2), some links, and a table of contents that links to the months with newly inserted anchor tags. HOWEVER, there is one further modification I needed to make. Kindle, in its booky ways, always assumes you want to indent a new paragraph. A bunch of indents for tweets and timestamps isn't really the best look, though. So we do a simple bit of faking...
12. REPLACE
At this point, the basic text transformation is complete and I'm going to stop the numbering. Having a big decently combined HTML file is the biggest part of getting things ready for final Kindle transformation; at that point it's just a matter of how easy or hard you want to make it on yourself to make things fancier.
COVER. I took the Captain Karl image used as @DadBoner's twitter avatar. At 377x500 it's quite smaller than the e-ink Kindles' 600x800 screens. Playing with some resizing possibilities, what I ended up doing was tripling it in size and applying a median filter in Paint Shop Pro. Smooth. I then resized it down to 566x750. Added borders on the left, right, and bottom to fill out the space. Enoguh room at the bottom for a decent-sized title, not much space wasted. If you really want to make it e-ink friendly at the expense of the color versions (which is what I did), try and get a palette of the 16 grays Kindle actually uses. By using "error diffusion dithering" when forcing an image into that palette you'll probably get something more decent than the way Kindle automatically displays color images.
CONVERSION. There are a few methods you can use. I've tried two and found both to have benefits; luckily I also found they could be combined.
Mobipocket Creator makes importing an HTML, setting a cover file, and creating a Kindle-compatible file very easy. However, using this method there were some specific things I couldn't figure out how to do, like making it easy to skip between chapters.
KindleGen is Amazon's own commandline program for turning things into HTML. There are more manual steps involved, but I found this guide by Helen Hanson pretty helpful for setting things like the toc.ncx and .opf file up. With this method, though, I got nicer results overall EXCEPT that the file size was much larger. KindleGen has a few compression options, but none seemed nearly as good as what Mobipocket Creator could do.
Luckily, I found they can be combined. After Mobipocket Creator imports an HTML file, it sticks it in a new directory, where it will also create various other files and put the output. Now that I knew something about the more manual method, I saw that when making its final out put it was creating the sort of files I was learning to make manually. So what if I pasted in the ones I'd made to replace the automatically created ones, then told it to build the output again? The formatting improvements were still there, and at the smaller filesize. Nice.