Content-space.de

The WikiBlog of Michael Hamann about changing technologies and more

User Tools

Site Tools


blog:2010:creating_epub_files_from_webpages

Creating ePub files from webpages

As my newest gadget (well, it's already more than one and a half months old) is an eBook reader device my interest for ePub files has grown as ePub is basically the format that works best for my device and probably generally small eInk devices.

Basically, an ePub file is a xhtml file with some additional information (e.g. a table of contents) zipped together into a handy file. So saving a webpage into an ePub document seems to be quite easy. Nevertheless there are some things the tool has to pay attention too as e.g. the xhtml in a epub file has to be valid xhtml 1.1. For further details on the file format I suggest reading this guide on ePub file creation.

Of course one could do that manually for every single page one wants to read, but I thought there has to be something more handy. Calibre is a great software for creating content for eBook reader devices and it can even process RSS feeds (though skimming through content as I do during reading my feeds is not exactly what you can do best with an eBook reader device). Nevertheless I wanted something more integrated into my favorite browser, Firefox. Surprisingly I couldn't find a single addon for Firefox that fits this purpose.

Finally I managed finding three web services that can create ePub files from webpages:

Web2FB2

The most simple one is Web2FB2, although it's name may suggest something different that service also creates ePub files. I said it's simple because it really just takes the webpage and creates an ePub file. It doesn't try to detect the main text or use RSS feeds or remove the navigation, but at least it includes all images. It seems to me that it ignores printing stylesheets, too. So it depends on the webpage and which parts of it you are interested in how well the result is usable.

zinepal.com

The probably most sophisticated service I've found is zinepal.com - it's basically intended for creating newspapers from webpages, feeds or Google news or blog search results. You can add up to 5 (25 in the pro version) webpages to a so-called “Zine” for which you can then choose several options and get it as PDF or ePub. When you are pro-user (which are you automatically for the first month of usage) you can even add your own logo or introductory section. You can also select which fonts shall be used and as pro user you can add advertisement (which of course is more intended for republishing the result than for personal use ;)). Besides that it creates quite useful ePubs (and PDFs) from the specified pages and feeds, it removes almost everything but the actual content, the only negative point is it doesn't add a table of contents for the included pages.

Instapaper

Instapaper is “A simple tool to save web pages for reading later.”. Nevertheless it looks really interesting to me as it is indeed quite simple, you even don't need a password for your account when you don't want one (but it still supports multiple folders, starring pages, iPhone apps…). With a simple bookmarklet you can save pages for later reading. And the best is: you can create an ePub of all your unread pages. As zinepal.com Instapaper most of the time removes everything but the essential text (and unfortunately like zinepal.com also most/all? images…), but as a big plus it adds a table of contents so you can quickly jump between the included pages.

Thus Instapaper looks exactly like what I've been looking for - a tool to make it easy to read all that stuff I find online but want to read in a quiet (or noisy) moment on my sofa (or on the train). I can collect the stuff I want to include and then when I want to get something to read I simply click the epub link and save the ePub file on my reader (switching between files is one of the slower operations, thus having it all in one file is also a big plus). And for the Amazon Kindle there would even be an auto-delivery option. The only thing I'm missing a bit are the images in the webpages, I hope that the algorithm for extracting the text will be improved, but of course I understand that such an algorithm is hard to create and always includes a lot of guessing.

If you should have discovered other ways for getting the web (or at least some parts of it ;))into an ePub file, please let me know!

Comments

This service is also very good: http://dotepub.com/

1 |
Adolfo Neto
| 2010/11/04 11:13 | reply

@Adolfo Neto: Thanks for the hint, it seems to be really nice and fast but doesn't include images.

2 |
Michael Hamann
| 2010/11/09 12:00 | reply

Thank you for this interesting and useful blog post :) Last time I tried to convert a RFC into an EPUB and used http://pretty-rfc.herokuapp.com/ for this. Unfortunately it doesn't support the conversion to EPUB yet. Instapaper couldn't handle the txt-file, whereas DotEpub was able to convert it but still with all the page headers and footers.

3 |
onny
| 2013/08/23 12:40 | reply
F Y U J O
blog/2010/creating_epub_files_from_webpages.txt · Last modified: by 127.0.0.1