Monday, April 26, 2021

Regular expressions and XPath

Writing regular expressions is not an easy task. I confess to you that I didn’t like to do it before. You do everything logically correctly, but the regular season does not give the desired result. Regular expressions are seldom used for web scraping many people use XPath.

What is XPath?

XPath is a language for querying elements of an XML document or HTML code. Designed to provide access to parts of an XML or HTML document. XPath aims to implement DOM navigation in XML and HTML.

Despite this, I still use regular expressions. You can read more about regular expressions in my blog, and here I will tell you why I decided to use regular expressions for parsing.

Sometimes the information is in a piece of JavaScript, and after you get it using XPath you need to continue processing that information, here you need regular expressions. Why not use regular expressions for the first act of scraping, then subject the result to regular expression processing and, if necessary, apply the third act? It seems convenient and logical to me.

You can visit my zintro profile and find out more about me

No comments:

Post a Comment