Strip HTML tags from a Web Page and save the Output to a Text File

Category:
ASP, HTML and XML
Type:
Modules
Difficulty:
Intermediate
Author:
Intelligent Solutions Inc.

Version Compatibility: Visual Basic 5

More information:
This .bas module includes a function that takes a string containing HTML as a parameter, and it returns the string with the HTML tags removed. The best thing about the function is how it formats the output, based on the tags. For instance, there is support for ordered lists, bulleted lists, and special characters. In addition, tables are recognized and outputted intelligently.

Using an optional parameter, you can save the text output to a file. Usage of the function is explained in detail within the code's comments.

This is a great module for browser based applications that want to add a Save as Text option, or web crawlers that need to strip HTML tags from pages before saving or indexing them.

There is an alternative to this function on this site at http://www.freevbcode.com/ShowCode.Asp?ID=1037. That example is an application with a UI for opening, saving, and editing HTML files and the textual output. A comparison between the two methods enumerated at that location. If you need this functionality, the recommendation is to download both examples and use the one that works best for you.


Instructions: Click the link below to download the code. Select 'Save' from the IE popup dialog. Once downloaded, open the .zip file from your local drive using WinZip or a comparable program to view the contents.

Download html2text.zip