DadBoner on Kindle: How To

DadBoner on Kindle
How To

While compiling the 2010 entries for the first time a few months ago, I wrote down all the steps I took so I could more quickly replicate the results with the full 2011 set a few days later. I figure this information might be of use to someone out there. Even if you're not looking to exactly replicate the process of taking Twitter Fiction Reader pages and compiling them into a Kindle end product, perhaps in the more general sense you could get some pointers for how to go about mass modification tasks.

I used EditPad for most of these text tasks, but having something that can work with regular expressions should be the most important thing.

1. Open all the files, let the search/replace operations run with the "All Files" option checked.

2. REMOVE the following (search for it and replace all matches with blank)
<html> <head> <title>Twitter Fiction Reader</title> <script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js" type="text/javascript"></script> <link href='http://fonts.googleapis.com/css?family=EB+Garamond' rel='stylesheet' type='text/css'> <script src="/reader.js"></script> <link type="text/css" rel="stylesheet" href="/style.css"></link> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-5663087-5']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> </head> <body> <div id="container"> <header> <h1><a href="/">Twitter Fiction Reader</a></h1> </header>
This gets rid of the HTML header information from each page, which isn't individually needed since they're being combined.

3. REMOVE
<a href="/story/DadBoner">DadBoner</a> -
Again since they're being combined into one big file, we don't need a link to a higher-level section given every single day.

4. Using a regular expression (regex) search with "Dot Matches Newline" selected, replace the following
<div class="links".*
with
</div>
This area contained the links to the previous and next days, unnecessary here. Instead it ends things there, with the day's contents nicely wrapped in a div. Why in a div? Mostly because it was set up that way to begin with. The less change the more likely it's going to match the original, and less hassle. Also, note that the combination of .* and "Dot Matches Newline" makes this replace everything else at the end of the page. This is why thus far all the files have been left separate; if you do it once everything has already been combined, it would wipe out everything past the bottom of the first day!

5. You could do the next few steps with the files still all open in separate tabs, but at some point you'll need to combine them. As long as the search/replaces are still going on across all files, it doesn't make a difference whether you do it now or whether the rest of these are complete. How to go about this? As a Windows guy, I first made sure all the files I wanted to combine were in a directory together, and their names were in proper alphanumeric order. In the case of DadBoner 2010 I had 2010-04-15.HTM, 2010-04-18.HTM, on and on to 2010-12-31.HTM. I then made a .bat file in the directory (specifically I called it mergem.bat, though that's unimportant) which had a single command: copy/b *.htm 2010.htm You could just type the same thing into the commandline, but it's often simpler to make something into a .bat file than to navigate to some crazy nested subdirectory through the command prompt. Anyway, that command makes it take all the existing *.htm files and combine them into the single 2010.htm in the order of their filenames.

6. REMOVE regex
<a name="[0-9]*" href="#[0-9]*" class="anchor">#</a>
The table of contents certainly isn't going to have links to every single tweet, and unlike a web page nobody is going to be linking to one externally. So having anchors defined and the links to them immediately presented for thousands of messages is just a waste of file size.

7. REMOVE regex
id="[0-9]*"
Continuation of the previous.

8. REPLACE
<div id="content">
with
<mbp:pagebreak /><div id="content">
Basically this means every time a new day of content is seen, throw in the tag that tells Kindle to go to a new page.

9. REPLACE
http://twitter.com/
with
httptwittercom
This is a temporary change. The next step has to do with finding and modifying links, but we don't want the links to the original tweets messed with. A good way to do that is to temporarily turn them into something else. It's kind of like that episode where the Doctor hid by transforming into a human, but in this case it's going to work without incident.

10. REPLACE regex
(http://[^"\r\n< ]*)
with
<a href="$1">$1</a>
When Karl gives a link, he's really just giving the URL as text--Twitter interprets it as a link. So by just grabbing the content of his messages, we'd get nonclickable URLs. Not convenient. This tries to match any URL in the document and replace it with a link that uses the URL as both the target and the displayed text. It's worth noting that @ links to fellow Twitter users like @KimKardashian should be made into links, too, but as those only came up a few times in 2010 and not at all in 2011 it's something that didn't really need a fancy command--just search for any and make them links if need be.

11. REPLACE
httptwittercom
with
http://twitter.com/
This reverses the switch made a couple steps ago, restoring the existing links to themselves.

At this point we have things very very close to what I put up as HTML versions, except that that includes a simple replacement header (since we chopped them off in step 2), some links, and a table of contents that links to the months with newly inserted anchor tags. HOWEVER, there is one further modification I needed to make. Kindle, in its booky ways, always assumes you want to indent a new paragraph. A bunch of indents for tweets and timestamps isn't really the best look, though. So we do a simple bit of faking...

12. REPLACE
<p>
with
<p><br>
Now it's as if each paragraph is starting on the second line. But it works. When I view it on Firefox, though, the extra line breaks are apparent, which is why they're not present in the HTML versions available here.

At this point, the basic text transformation is complete and I'm going to stop the numbering. Having a big decently combined HTML file is the biggest part of getting things ready for final Kindle transformation; at that point it's just a matter of how easy or hard you want to make it on yourself to make things fancier.

COVER. I took the Captain Karl image used as @DadBoner's twitter avatar. At 377x500 it's quite smaller than the e-ink Kindles' 600x800 screens. Playing with some resizing possibilities, what I ended up doing was tripling it in size and applying a median filter in Paint Shop Pro. Smooth. I then resized it down to 566x750. Added borders on the left, right, and bottom to fill out the space. Enoguh room at the bottom for a decent-sized title, not much space wasted. If you really want to make it e-ink friendly at the expense of the color versions (which is what I did), try and get a palette of the 16 grays Kindle actually uses. By using "error diffusion dithering" when forcing an image into that palette you'll probably get something more decent than the way Kindle automatically displays color images.

CONVERSION. There are a few methods you can use. I've tried two and found both to have benefits; luckily I also found they could be combined.

Mobipocket Creator makes importing an HTML, setting a cover file, and creating a Kindle-compatible file very easy. However, using this method there were some specific things I couldn't figure out how to do, like making it easy to skip between chapters.

KindleGen is Amazon's own commandline program for turning things into HTML. There are more manual steps involved, but I found this guide by Helen Hanson pretty helpful for setting things like the toc.ncx and .opf file up. With this method, though, I got nicer results overall EXCEPT that the file size was much larger. KindleGen has a few compression options, but none seemed nearly as good as what Mobipocket Creator could do.

Luckily, I found they can be combined. After Mobipocket Creator imports an HTML file, it sticks it in a new directory, where it will also create various other files and put the output. Now that I knew something about the more manual method, I saw that when making its final out put it was creating the sort of files I was learning to make manually. So what if I pasted in the ones I'd made to replace the automatically created ones, then told it to build the output again? The formatting improvements were still there, and at the smaller filesize. Nice.

Back to main page