Java: Parse using regular expression

Posted on May 20, 2009

0


Java pattern and matcher under java.util.regex.* is extremely useful to pull a portion of a string that matches with the pattern defined. Java pattern and matcher are also able to parse information from the portion of a string by defining the location of the pattern that contains information. This is the example of the application:

Document to be parsed

<tr>
<td width="10px" rowspan="6"></td>
<td height="17px">iLO MP:</td>
<td colspan="3">F.02.17</td>
</tr>
<tr>
<td height="17px">BMC:</td>
<td colspan="3">05.23</td>
</tr>
<tr>
<td height="17px">EFI:</td>
<td colspan="3">ROM A 07.12, ROM B 07.12</td>
</tr>
<tr>
<td height="17px">System Firmware:</td>
<td colspan="3">ROM A 04.03, ROM B 04.03, Boot ROM A</td>
</tr>
<tr>
<td height="17px">UCIO:</td>
<td colspan="3">03.0b</td>
</tr>
<tr>
<td height="17px">PRS:</td>
<td colspan="3">00.08 UpSeqRev: 02, DownSeqRev: 01</td>
</tr>
<tr class="whiteSpaceLg">
<td colspan="2"></td>
<td colspan="3"></td>
</tr>

Intended Result

iLO MP: F.02.17
BMC: 05.23
EFI: ROM A 07.12, ROM B 07.12
System Firmware: ROM A 04.03, ROM B 04.03, Boot ROM A
UCIO: 03.0b
PRS: 00.08 UpSeqRev: 02, DownSeqRev: 01

Java Pattern and Matcher snippet

Pattern p = Pattern.compile("<td height=\"17px\">([^<>]+):</td><td colspan=\"3\">([^<>]+)</td>");
Matcher m = p.matcher(buffer);
while(m.find()) { System.out.println(m.group(1)+": "+m.group(2)); }

Notice that the pattern contains the regular expression that tries to look for the <td height=”17px”>`var1`</td>, and the `var1` value is kept as the first entry in the m.group(). This also applies for the second pattern <td colspan=”3″>`var2`</td>. Hence, the m.group(1) and m.group(2) produce the `var1` and `var2` respectively. Any other lines from the document that doesn’t match to the pattern are ignored.

Additionally, the m.group() will return all lines that match to the pattern, i.e.

<td height="17px">iLO MP:</td><td colspan="3">F.02.17</td>
<td height="17px">BMC:</td><td colspan="3">05.23</td>
<td height="17px">EFI:</td><td colspan="3">ROM A 07.12, ROM B 07.12</td>
<td height="17px">System Firmware:</td><td colspan="3">ROM A 04.03, ROM B 04.03, Boot ROM A</td>
<td height="17px">UCIO:</td><td colspan="3">03.0b</td>
<td height="17px">PRS:</td><td colspan="3">00.08 UpSeqRev: 02, DownSeqRev: 01</td>

Further Reading

Refer to the Java Doc 1.4.2 Pattern for a better understanding of Java Pattern and Matcher.

Advertisements
Tagged:
Posted in: Technology