Halo para sobat online, kali ini saya akan berbagi sedikit ilmu mengenai bahasa pemrograman pada c#. Topik yang akan saya bahas lebih lanjut yaitu mengenai REGEX (Regular Expression). Kalau boleh jujur, sebenarnya ini merupakan tugas mata kuliah Bahasa Pemrograman dari dosen saya yang tersayang. Daripada terlalu lama, langsung saja silahkan dinikmati sedikit ilmu berikut ini.
Regular Expression
Regular Expression atau yang sering
disebut REGEX adalah pola yang dapat dicocokkan dengan input teks. Net
framework menyediakan mesin Regular Expression yang memungkinkan pencocokan
tersebut. Pola ini terdiri dari satu atau lebih karakter literal, operator,
atau konstruksi.
Konstruksi
untuk menentukan REGEX
Ada berbagai kategori karakter,
operator dan konstruksi yang memungkinkan untuk menentukan Regular Expression. Diantaranya
:
- Selang karakter (character escapes)
- Kelas karakter (character class)
- Anchors
- Pengelompokan kontruksi (grouping construct)
- Kuantitas (quantifiers)
- Konstruksi referensi ( backreference construct)
- Konstruksi alternatif (alternation construct)
- Penggantian (substitution)
- Konstruksi miscellaneous (miscellaneous construct)
Selang karakter (character escapes)
Character escapes atau special character merupakan
kategori dasar. Karakter backslash
(\) dalam Regular Expression menunjukkan bahwa karakter yang mengikutinya
merupakan karakter khusus atau harus ditafsirkan secara harfiah.
Berikut tabel yang menjelaskan tentang character escape
Berikut tabel yang menjelaskan tentang character escape
|
Description
|
Pattern
|
Matches
|
\a
|
Matches a bell character, \u0007.
|
\a
|
"\u0007" in
"Warning!" + '\u0007'
|
\b
|
In a character class, matches a
backspace, \u0008.
|
[\b]{3,}
|
"\b\b\b\b" in
"\b\b\b\b"
|
\t
|
Matches a tab, \u0009.
|
(\w+)\t
|
"Name\t",
"Addr\t" in "Name\tAddr\t"
|
\r
|
Matches a carriage return, \u000D.
(\r is not equivalent to the newline character, \n.)
|
\r\n(\w+)
|
"\r\nHello" in
"\r\Hello\nWorld."
|
\v
|
Matches a vertical tab, \u000B.
|
[\v]{2,}
|
"\v\v\v" in
"\v\v\v"
|
\f
|
Matches a form feed, \u000C.
|
[\f]{2,}
|
"\f\f\f" in
"\f\f\f"
|
\n
|
Matches a new line, \u000A.
|
\r\n(\w+)
|
"\r\nHello" in
"\r\Hello\nWorld."
|
\e
|
Matches an escape, \u001B.
|
\e
|
"\x001B" in
"\x001B"
|
\ nnn
|
Uses octal representation to
specify a character (nnn consists of up to three digits).
|
\w\040\w
|
"a b", "c d"
in "a bc d"
|
\x nn
|
Uses hexadecimal representation to
specify a character (nn consists of exactly two digits).
|
\w\x20\w
|
"a b", "c d"
in "a bc d"
|
\c X \c x
|
Matches the ASCII control
character that is specified by X or x, where X or x is the letter of the
control character.
|
\cC
|
"\x0003" in
"\x0003" (Ctrl-C)
|
\u nnnn
|
Matches a Unicode character by
using hexadecimal representation (exactly four digits, as represented by
nnnn).
|
\w\u0020\w
|
"a b", "c d"
in "a bc d"
|
\
|
When followed by a character that
is not recognized as an escaped character, matches that character.
|
\d+[\+-x\*]\d+\d+[\+-x\*\d+
|
"2+2" and
"3*9" in "(2+2) * 3*9"
|
Kelas
Karakter (Class Character)
Class character berfungsi mencocokkan salah satu dari sekumpulan karakter. Tabel
berikut menggambarkan Class character (mohon
maaf menggunakan bahasa Inggris):
Character class
|
Description
|
Pattern
|
Matches
|
[character_group]
|
Matches any single character in character_group. By
default, the match is case-sensitive.
|
[mn]
|
"m" in "mat" "m",
"n" in "moon"
|
[^character_group]
|
Negation: Matches any single character that is not in
character_group. By default, characters incharacter_group are case-sensitive.
|
[^aei]
|
"v", "l" in "avail"
|
[ first - last ]
|
Character range: Matches any single character in the range
from first to last.
|
(\w+)\t
|
"Name\t", "Addr\t" in
"Name\tAddr\t"
|
.
|
Wildcard: Matches any single character except \n.
|
a.e
|
"ave" in "have" "ate" in
"mate"
|
\p{ name }
|
Matches any single character in the Unicode general
category or named block specified by name.
|
\p{Lu}
|
"C", "L" in "City Lights"
|
\P{ name }
|
Matches any single character that is not in the Unicode
general category or named block specified by name.
|
\P{Lu}
|
"i", "t", "y" in
"City"
|
\w
|
Matches any word character.
|
\w
|
"R", "o", "m" and
"1" in "Room#1"
|
\W
|
Matches any non-word character.
|
\W
|
"#" in "Room#1"
|
\s
|
Matches any white-space character.
|
\w\s
|
"D " in "ID A1.3"
|
\S
|
Matches any non-white-space character.
|
\s\S
|
" _" in "int __ctr"
|
\d
|
Matches any decimal digit.
|
\d
|
"4" in "4 = IV"
|
\D
|
Matches any character other than a decimal digit.
|
\D
|
" ", "=", " ",
"I", "V" in "4 = IV"
|
Anchors
Anchors memungkinkan kecocokkan
untuk berhasil atau gagal tergantung pada posisi saat ini dalam string. Tabel
berikut mencantumkan anchors (mohon maaf menggunakan bahasa Inggris):
Assertion
|
Description
|
Pattern
|
Matches
|
^
|
The match must start at the
beginning of the string or line.
|
^\d{3}
|
"567" in
"567-777-"
|
$
|
The match must occur at the end of
the string or before \n at the end of the line or string.
|
-\d{4}$
|
"-2012" in
"8-12-2012"
|
\A
|
The match must occur at the start
of the string.
|
\A\w{3}
|
"Code" in
"Code-007-"
|
\Z
|
The match must occur at the end of
the string or before \n at the end of the string.
|
-\d{3}\Z
|
"-007" in
"Bond-901-007"
|
\z
|
The match must occur at the end of
the string.
|
-\d{3}\z
|
"-333" in
"-901-333"
|
\G
|
The match must occur at the point
where the previous match ended.
|
\\G\(\d\)
|
"(1)", "(3)",
"(5)" in "(1)(3)(5)[7](9)"
|
\b
|
The match must occur on a boundary
between a \w (alphanumeric) and a \W(nonalphanumeric)
character.
|
\w
|
"R", "o",
"m" and "1" in "Room#1"
|
\B
|
The match must not occur on a \b
boundary.
|
\Bend\w*\b
|
"ends",
"ender" in "end sends endure lender"
|
Pengelompokan kontruksi (grouping construct)
Pengelompokan konstruksi
menggambarkan sub-expressions dari substring Regular Expression dan menangkap substring
dari input string. Tabel berikut mencantumkan pengelompokan konstruksi (mohon
maaf menggunakan bahasa Inggris):
Grouping
construct
|
Description
|
Pattern
|
Matches
|
( subexpression )
|
Captures the matched subexpression
and assigns it a zero-based ordinal number.
|
(\w)\1
|
"ee" in "deep"
|
(?< name >subexpression)
|
Captures the matched subexpression
into a named group.
|
(?< double>\w)\k<
double>
|
"ee" in "deep"
|
(?< name1 -name2
>subexpression)
|
Defines a balancing group
definition.
|
(((?'Open'\()[^\(\)]*)+((?'Close-Open'\))[^\(\)]*)+)*(?(Open)(?!))$
|
"((1-3)*(3-1))" in
"3+2^((1-3)*(3-1))"
|
(?: subexpression)
|
Defines a noncapturing group.
|
Write(?:Line)?
|
"WriteLine" in
"Console.WriteLine()"
|
(?imnsx-imnsx:subexpression)
|
Applies or disables the specified
options within subexpression.
|
A\d{2}(?i:\w+)\b
|
"A12xl",
"A12XL" in "A12xl A12XL a12xl"
|
(?= subexpression)
|
Zero-width positive lookahead
assertion.
|
\w+(?=\.)
|
"is", "ran",
and "out" in "He is. The dog ran. The sun is out."
|
(?! subexpression)
|
Zero-width negative lookahead
assertion.
|
\b(?!un)\w+\b
|
"sure", "used"
in "unsure sure unity used"
|
(?< =subexpression)
|
Zero-width positive lookbehind
assertion.
|
(?< =19)\d{2}\b
|
"51", "03" in
"1851 1999 1950 1905 2003"
|
(?< ! subexpression)
|
Zero-width negative lookbehind
assertion.
|
(?< !19)\d{2}\b
|
"ends",
"ender" in "end sends endure lender"
|
(?> subexpression)
|
Nonbacktracking (or
"greedy") subexpression.
|
[13579](?>A+B+)
|
"1ABB",
"3ABB", and "5AB" in "1ABB 3ABBC 5AB 5AC"
|
Kuantitas (quantifiers)
Quantifiers berkaitan dengan menentukan
berapa banyak contoh unsur sebelumnya (yang dapat menjadi karakter, grup, atau
kelas karakter) harus hadir dalam input string untuk pencocokan yang terjadi.
Berikut tabel yang memuat quatifiers
(mohon maaf menggunakan bahasa Inggris)
Quantifier
|
Description
|
Pattern
|
Matches
|
*
|
Matches the previous element zero
or more times.
|
\d*\.\d
|
".0", "19.9",
"219.9"
|
+
|
Matches the previous element one
or more times.
|
"be+"
|
"bee" in
"been", "be" in "bent"
|
?
|
Matches the previous element zero
or one time.
|
"rai?n"
|
"ran", "rain"
|
{ n }
|
Matches the previous element
exactly n times.
|
",\d{3}"
|
",043" in
"1,043.6", ",876", ",543", and ",210"
in "9,876,543,210"
|
{ n ,}
|
Matches the previous element at
least n times.
|
"\d{2,}"
|
"166", "29",
"1930"
|
{ n , m }
|
Matches the previous element at
least n times, but no more than m times.
|
"\d{3,5}"
|
"166", "17668"
"19302" in "193024"
|
*?
|
Matches the previous element zero
or more times, but as few times as possible.
|
\d*?\.\d
|
".0", "19.9",
"219.9"
|
+?
|
Matches the previous element one
or more times, but as few times as possible.
|
"be+?"
|
"be" in
"been", "be" in "bent"
|
??
|
Matches the previous element zero
or one time, but as few times as possible.
|
"rai??n"
|
"ran", "rain"
|
{ n }?
|
Matches the preceding element
exactly n times.
|
",\d{3}?"
|
",043" in
"1,043.6", ",876", ",543", and ",210"
in "9,876,543,210"
|
{ n ,}?
|
Matches the previous element at
least n times, but as few times as possible.
|
"\d{2,}?"
|
"166", "29",
"1930"
|
{ n , m }?
|
Matches the previous element
between n and m times, but as few times as possible.
|
"\d{3,5}?"
|
"166", "17668"
"193", "024" in "193024"
|
Konstruksi referensi ( backreference construct)
Backreference construct memungkinkan
sub-expression sebelumnya cocok untuk diidentifikasi kemudian dalam Regular
Expression yang sama.
Tabel berikut mencantumkan
konstruksi ini (mohon maaf menggunakan bahasa Inggris):
Backreference
construct
|
Description
|
Pattern
|
Matches
|
\ number
|
Backreference. Matches the value
of a numbered subexpression.
|
(\w)\1
|
"ee" in "seek"
|
\k< name >
|
Named backreference. Matches the
value of a named expression.
|
(?< char>\w)\k< char>
|
"ee" in "seek"
|
Konstruksi alternatif (alternation construct)
Alternation construct mengubah Regular
Expression untuk mengaktifkan baik / atau cocok. Tabel berikut mencantumkan Alternation
construct (mohon maaf menggunakan bahasa Inggris):
Alternation
construct
|
Description
|
Pattern
|
Matches
|
|
|
Matches any one element separated by the vertical bar (|)
character.
|
th(e|is|at)
|
"the", "this" in "this is the
day. "
|
(?( expression )yes | no )
|
Matches yes if expression matches; otherwise,
matches the optional no part. Expression is interpreted as a
zero-width assertion.
|
(?(A)A\d{2}\b|\b\d{3}\b)
|
"A10", "910" in "A10 C103
910"
|
(?( name )yes | no )
|
Matches yes if the named capture name has a match;
otherwise, matches the optional no.
|
(?< quoted>")?(?(quoted).+?"|\S+\s)
|
Dogs.jpg, "Yiska playing.jpg" in "Dogs.jpg
"Yiska playing.jpg""
|
Penggantian (substitution)
Substitution digunakan dalam
pola-pola penggantian. Tabel berikut mencantumkan substitution (mohon maaf
menggunakan bahasa Inggris):
Character
|
Description
|
Pattern
|
Replacement
pattern
|
Input
string
|
Result
string
|
$number
|
Substitutes the substring matched
by group number.
|
\b(\w+)(\s)(\w+)\b
|
$3$2$1
|
"one two"
|
"two one"
|
${name}
|
Substitutes the substring matched
by the named groupname.
|
\b(?< word1>\w+)(\s)(?<
word2>\w+)\b
|
${word2} ${word1}
|
"one two"
|
"two one"
|
$$
|
Substitutes a literal
"$".
|
\b(\d+)\s?USD
|
$$$1
|
"103 USD"
|
"$103"
|
$&
|
Substitutes a copy of the whole
match.
|
(\$*(\d*(\.+\d+)?){1})
|
**$&
|
"$1.30"
|
"**$1.30**"
|
$`
|
Substitutes all the text of the
input string before the match.
|
B+
|
$`
|
"AABBCC"
|
"AAAACC"
|
$'
|
Substitutes all the text of the
input string after the match.
|
B+
|
$'
|
"AABBCC"
|
"AACCCC"
|
$+
|
Substitutes the last group that
was captured.
|
B+(C+)
|
$+
|
"AABBCCDD"
|
AACCDD
|
$_
|
Substitutes the entire input
string.
|
B+
|
$_
|
"AABBCC"
|
"AAAABBCCCC"
|
Konstruksi miscellaneous (miscellaneous construct)
Berikut ini adalah berbagai
miscellaneous construct (mohon maaf menggunakan bahasa Inggris):
Construct
|
Definition
|
Example
|
(?imnsx-imnsx)
|
Sets or disables options such as
case insensitivity in the middle of a pattern.
|
\bA(?i)b\w+\b matches
"ABA", "Able" in "ABA Able Act"
|
(?#comment)
|
Inline comment. The comment ends
at the first closing parenthesis.
|
\bA(?#Matches words starting with
A)\w+\b
|
# [to end of line]
|
X-mode comment. The comment starts
at an unescaped # and continues to the end of the line.
|
(?x)\bA\w+\b#Matches words
starting with A
|
Kelas
REGEX
Kelas REGEX digunakan untuk
merepresentasikan REGEX. Kelas REGEX memiliki metode yang umum digunakan
sebagai berikut:
No.
|
Metode dan deskripsinya
|
1.
|
Public bool IsMatch (string input)
Menunjukkan apakah Regular
Expression yang ditentukan dalam konstruktor Regex menemukan korek api dalam
input string tertentu
|
2.
|
public bool IsMatch( string input, int startat )
Menunjukkan apakah Regular
Expression yang ditentukan dalam konstruktor Regex mencari kecocokan dalam
input string tertentu, mulai posisi awal ditentukan dalam string.
|
3.
|
public static bool IsMatch( string input, string pattern )
Menunjukkan apakah Regular
Expression yang ditetapkan menemukan kecocokan dalam input string tertentu.
|
4.
|
public MatchCollection Matches( string input )
Mencari masukan string tertentu
untuk semua kemunculan dari Regular Expression.
|
5.
|
public string Replace( string input, string replacement )
Dalam input string tertentu, menggantikan semua string yang
cocok dengan pola Regular Expression dengan
string pengganti tertentu.
|
6.
|
public string[] Split( string input )
Membagi input string ke dalam
array substring pada posisi yang didefinisikan oleh pola Regular Expression
yang ditentukan dalam konstruktor Regex.
|
Contoh penerapan REGEX pada c#
Contoh
1
Contoh berikut sesuai dengan
kata-kata yang dimulai dengan 'S':
using
System;
using
System.Text.RegularExpressions;
namespace
RegExApplication
{
class Program
{
private static void showMatch(string
text, string expr)
{
Console.WriteLine("The
Expression: " + expr);
MatchCollection mc =
Regex.Matches(text, expr);
foreach (Match m in mc)
{
Console.WriteLine(m);
}
}
static void Main(string[] args)
{
string str = "A Thousand Splendid
Suns";
Console.WriteLine("Matching words
that start with 'S': ");
showMatch(str, @"\bS\S*");
Console.ReadKey();
}
}
}
Ketika kode di atas dikompilasi dan
dijalankan, itu menghasilkan berikut hasilnya:
Matching words that start with 'S':
The Expression: \bS\S*
Splendid
Suns
Contoh2
Contoh ini menggantikan tambahan
ruang putih:
using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
class Program
{
static void Main(string[] args)
{
string input = "Hello World ";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
Console.ReadKey();
}
}
}
Ketika kode di atas dikompilasi dan
dijalankan, itu menghasilkan berikut hasilnya:
Original String: Hello World
Replacement String: Hello World
Daftar Pustaka:
Anonim. (2013). C# - Regular Expressions [Online]. Tersedia: www.tutorialspoint.com/csharp/csharp_regular_expressions.htm [14 Mei 2013].
Untuk mendapat beberapa ilmu lainnya mengenai bahasa pemrograman pada C# bisa mengunduh lewat link-link berikut. Semoga sedikit ilmu dari saya bisa bermanfaat ^_^
4shared klik disiniDropbox klik disini
Mediafire klik disini
Wassalamualaikum Warahmatullahi Wabarakatuh
Tidak ada komentar:
Posting Komentar