Back to Homepage


Bash script encodes characters to HTML entities


The most annoying thing for a static web site maintainer who have to deal with Italian language is to use accented characters while writing text in a paragraph, since they have to be converted in html entities, to be correctly managed by browsers.

Thinking this, after unsuccessfully googling about, looking for a ready-made command for bash, I wrote a simple bash script to solve this problem. Thinking that other person could be interested in my script, I finally decided to publish it in open source in my site.

Although there are editors, like Bluefish, which have the possibility to convert characters to html entities and vice-versa, I prefer to use a bash script to prepare my web pages all together, in a final step, after editing them all.

I deliberately avoided to rely on code solution and shortcuts that would have made my code more efficient, since I was more interested in minimizing risk of bugs and to have a readable code

My code focuses on Italian characters, but the script is easily extendible to all other languages, just by adding lines to convert special carachters to html entities.

The script is well documented: it uses sed command to substitute accented characters. It substitutes a single special character a time, writing back and forth between two files that are erased at the end of the substitution process.

To use it, just normally write your text in your html pages you need to modify, then launch this script: it will prepare all html pages to be uploaded to your server

Please note that you need to make this file executable by setting the related flag before using it or it will not run

To avoid modifying original html files, I copy them all in a target subfolder that I called a2html (but of course, you can choose another name for it!)

This script is for bash. In case you are programming in PHP, there is a PHP command to perform this task on all special characters at a single time:
   $string_out = htmlentities($string_in, ENT_COMPAT, 'UTF-8');

Back to Homepage