Simple HTML Validator

Let say you are dynamically pulling content into a place (such as an email or widget on your webpage) where you only want so much of the content to actually show up. And lets say that content contains HTML tags for formatting (or whatever other purpose). One problem you will run into is that some HTML tag in there somewhere won't be properly closed and your content gets all out of whack. Well you don't want to have to change up and statically put that content in there for each one right?

Here is an example I've put together using ColdFusion (and a bit of RegEx) to validate and add closing HTML tags to your content:

<cffunction access="private" name="validateHtml" output="false" returntype="any">
<cfargument name="html" required="true" type="string">
<cfscript>
   var token = "[[:word:]]+|[^[:word:]<]|(?:<(\/)?([[:word:]]+)[^>]*(\/)?>)|<";
   var selfClosingTag = "^(?:[hb]r|img)$";
   var output = "";
   var tag = "";
   var openTags = "";
   var strPos = 0;
   var i = 1;
   var match = reFind( token, html, i, "true" );
   
   // find tags
   while( match.pos[1] ) {
    // if this is an HTML tag
    if( match.pos[3] ) {
     output &= mid( html, match.pos[1], match.len[1] );
     tag = mid( html, match.pos[3], match.len[3] );
     
     // if this is not a self-closing tag
     if ( !( match.pos[4] || reFindNoCase( selfClosingTag, tag ) ) ) {
      // if this is a closing tag
      if( match.pos[2] && listFindNoCase( openTags, tag ) ) {
       openTags = listDeleteAt( openTags, listFindNoCase( openTags, tag ) ); 
      } else {
       openTags = listAppend( openTags, tag );
      }
     }
    } else {
     output &= mid( html, match.pos[1], match.len[1] );
    }
    
    i += match.len[1];
    match = reFind( token, html, i, "true" );
   }
   
   // close any tags which were left open
   while( listLen( openTags ) ) {
    output &= "";
    openTags = listDeleteAt( openTags, listLen( openTags ) );
   }
   
   return output;
</cfscript>
</cffunction>

No comments:

Post a Comment