Monday, April 19, 2010

Retrive Content Of HTML page in an Aspx Page

If you are creating a new ASP.NET application, but have a huge collection of existing content in html files, one option is to move all the content into a database and generate pages dynamically. However, migration to a database can be a time-consuming task depending on the volume of content. So wouldn't it be easier to somehow import the relevant parts of the existing html pages into your aspx page?

The use of System.IO and regular expressions makes this a very easy task. Place a control (ID="htmlbody") on your page, and then use the following code to strip out everything up to and including the <body> tag (regardless of whether the tag contains additional attributes), and everything from the closing </body> tag onwards:

StreamReader sr;
string html;
sr = File.OpenText("");
html = sr.ReadToEnd();
sr.Close();

Regex start = new Regex(@"[\s\S]*<body[^<]*>", RegexOptions.IgnoreCase);
html = start.Replace(html,"");
Regex end = new Regex(@"</body[\s\S]*", RegexOptions.IgnoreCase);
html = end.Replace(html, "");
htmlbody.Text = html;