Assalamualaikum Warahmatullahi Wabarakatuh

Halo para sobat online, kali ini saya akan berbagi sedikit ilmu mengenai bahasa pemrograman pada c#. Topik yang akan saya bahas lebih lanjut yaitu mengenai REGEX (Regular Expression). Kalau boleh jujur, sebenarnya ini merupakan tugas mata kuliah Bahasa Pemrograman dari dosen saya yang tersayang. Daripada terlalu lama, langsung saja silahkan dinikmati sedikit ilmu berikut ini.

Regular Expression

Regular Expression atau yang sering disebut REGEX adalah pola yang dapat dicocokkan dengan input teks. Net framework menyediakan mesin Regular Expression yang memungkinkan pencocokan tersebut. Pola ini terdiri dari satu atau lebih karakter literal, operator, atau konstruksi.

Konstruksi untuk menentukan REGEX

Ada berbagai kategori karakter, operator dan konstruksi yang memungkinkan untuk menentukan Regular Expression. Diantaranya :

Selang karakter (character escapes)

Kelas karakter (character class)

Anchors

Pengelompokan kontruksi (grouping construct)

Kuantitas (quantifiers)

Konstruksi referensi ( backreference construct)

Konstruksi alternatif (alternation construct)

Penggantian (substitution)

Konstruksi miscellaneous (miscellaneous construct)

Selang karakter (character escapes)

Character escapes atau special character merupakan kategori dasar. Karakter backslash (\) dalam Regular Expression menunjukkan bahwa karakter yang mengikutinya merupakan karakter khusus atau harus ditafsirkan secara harfiah.

Berikut tabel yang menjelaskan tentang character escape

Escaped character	Description	Pattern	Matches
\a	Matches a bell character, \u0007.	\a	"\u0007" in "Warning!" + '\u0007'
\b	In a character class, matches a backspace, \u0008.	[\b]{3,}	"\b\b\b\b" in "\b\b\b\b"
\t	Matches a tab, \u0009.	(\w+)\t	"Name\t", "Addr\t" in "Name\tAddr\t"
\r	Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.)	\r\n(\w+)	"\r\nHello" in "\r\Hello\nWorld."
\v	Matches a vertical tab, \u000B.	[\v]{2,}	"\v\v\v" in "\v\v\v"
\f	Matches a form feed, \u000C.	[\f]{2,}	"\f\f\f" in "\f\f\f"
\n	Matches a new line, \u000A.	\r\n(\w+)	"\r\nHello" in "\r\Hello\nWorld."
\e	Matches an escape, \u001B.	\e	"\x001B" in "\x001B"
\ nnn	Uses octal representation to specify a character (nnn consists of up to three digits).	\w\040\w	"a b", "c d" in "a bc d"
\x nn	Uses hexadecimal representation to specify a character (nn consists of exactly two digits).	\w\x20\w	"a b", "c d" in "a bc d"
\c X \c x	Matches the ASCII control character that is specified by X or x, where X or x is the letter of the control character.	\cC	"\x0003" in "\x0003" (Ctrl-C)
\u nnnn	Matches a Unicode character by using hexadecimal representation (exactly four digits, as represented by nnnn).	\w\u0020\w	"a b", "c d" in "a bc d"
\	When followed by a character that is not recognized as an escaped character, matches that character.	\d+[\+-x\]\d+\d+[\+-x\\d+	"2+2" and "39" in "(2+2) 3*9"

Kelas Karakter (Class Character)

Class character berfungsi mencocokkan salah satu dari sekumpulan karakter. Tabel berikut menggambarkan Class character (mohon maaf menggunakan bahasa Inggris):

Character class	Description	Pattern	Matches
[character_group]	Matches any single character in character_group. By default, the match is case-sensitive.	[mn]	"m" in "mat" "m", "n" in "moon"
[^character_group]	Negation: Matches any single character that is not in character_group. By default, characters incharacter_group are case-sensitive.	[^aei]	"v", "l" in "avail"
[ first - last ]	Character range: Matches any single character in the range from first to last.	(\w+)\t	"Name\t", "Addr\t" in "Name\tAddr\t"
.	Wildcard: Matches any single character except \n.	a.e	"ave" in "have" "ate" in "mate"
\p{ name }	Matches any single character in the Unicode general category or named block specified by name.	\p{Lu}	"C", "L" in "City Lights"
\P{ name }	Matches any single character that is not in the Unicode general category or named block specified by name.	\P{Lu}	"i", "t", "y" in "City"
\w	Matches any word character.	\w	"R", "o", "m" and "1" in "Room#1"
\W	Matches any non-word character.	\W	"#" in "Room#1"
\s	Matches any white-space character.	\w\s	"D " in "ID A1.3"
\S	Matches any non-white-space character.	\s\S	" _" in "int __ctr"
\d	Matches any decimal digit.	\d	"4" in "4 = IV"
\D	Matches any character other than a decimal digit.	\D	" ", "=", " ", "I", "V" in "4 = IV"

Anchors

Anchors memungkinkan kecocokkan untuk berhasil atau gagal tergantung pada posisi saat ini dalam string. Tabel berikut mencantumkan anchors (mohon maaf menggunakan bahasa Inggris):

Assertion	Description	Pattern	Matches
^	The match must start at the beginning of the string or line.	^\d{3}	"567" in "567-777-"
$	The match must occur at the end of the string or before \n at the end of the line or string.	-\d{4}$	"-2012" in "8-12-2012"
\A	The match must occur at the start of the string.	\A\w{3}	"Code" in "Code-007-"
\Z	The match must occur at the end of the string or before \n at the end of the string.	-\d{3}\Z	"-007" in "Bond-901-007"
\z	The match must occur at the end of the string.	-\d{3}\z	"-333" in "-901-333"
\G	The match must occur at the point where the previous match ended.	\\G$\d$	"(1)", "(3)", "(5)" in "(1)(3)(5)[7](9)"
\b	The match must occur on a boundary between a \w (alphanumeric) and a \W(nonalphanumeric) character.	\w	"R", "o", "m" and "1" in "Room#1"
\B	The match must not occur on a \b boundary.	\Bend\w*\b	"ends", "ender" in "end sends endure lender"

Pengelompokan kontruksi (grouping construct)

Pengelompokan konstruksi menggambarkan sub-expressions dari substring Regular Expression dan menangkap substring dari input string. Tabel berikut mencantumkan pengelompokan konstruksi (mohon maaf menggunakan bahasa Inggris):

Grouping construct	Description	Pattern	Matches
( subexpression )	Captures the matched subexpression and assigns it a zero-based ordinal number.	(\w)\1	"ee" in "deep"
(?< name >subexpression)	Captures the matched subexpression into a named group.	(?< double>\w)\k< double>	"ee" in "deep"
(?< name1 -name2 >subexpression)	Defines a balancing group definition.	(((?'Open'$)[^\($])+((?'Close-Open'\))[^])+)*(?(Open)(?!))$	"((1-3)(3-1))" in "3+2^((1-3)(3-1))"
(?: subexpression)	Defines a noncapturing group.	Write(?:Line)?	"WriteLine" in "Console.WriteLine()"
(?imnsx-imnsx:subexpression)	Applies or disables the specified options within subexpression.	A\d{2}(?i:\w+)\b	"A12xl", "A12XL" in "A12xl A12XL a12xl"
(?= subexpression)	Zero-width positive lookahead assertion.	\w+(?=\.)	"is", "ran", and "out" in "He is. The dog ran. The sun is out."
(?! subexpression)	Zero-width negative lookahead assertion.	\b(?!un)\w+\b	"sure", "used" in "unsure sure unity used"
(?< =subexpression)	Zero-width positive lookbehind assertion.	(?< =19)\d{2}\b	"51", "03" in "1851 1999 1950 1905 2003"
(?< ! subexpression)	Zero-width negative lookbehind assertion.	(?< !19)\d{2}\b	"ends", "ender" in "end sends endure lender"
(?> subexpression)	Nonbacktracking (or "greedy") subexpression.	[13579](?>A+B+)	"1ABB", "3ABB", and "5AB" in "1ABB 3ABBC 5AB 5AC"

Kuantitas (quantifiers)

Quantifiers berkaitan dengan menentukan berapa banyak contoh unsur sebelumnya (yang dapat menjadi karakter, grup, atau kelas karakter) harus hadir dalam input string untuk pencocokan yang terjadi.

Berikut tabel yang memuat quatifiers (mohon maaf menggunakan bahasa Inggris)

Quantifier	Description	Pattern	Matches
*	Matches the previous element zero or more times.	\d*\.\d	".0", "19.9", "219.9"
+	Matches the previous element one or more times.	"be+"	"bee" in "been", "be" in "bent"
?	Matches the previous element zero or one time.	"rai?n"	"ran", "rain"
{ n }	Matches the previous element exactly n times.	",\d{3}"	",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210"
{ n ,}	Matches the previous element at least n times.	"\d{2,}"	"166", "29", "1930"
{ n , m }	Matches the previous element at least n times, but no more than m times.	"\d{3,5}"	"166", "17668" "19302" in "193024"
*?	Matches the previous element zero or more times, but as few times as possible.	\d*?\.\d	".0", "19.9", "219.9"
+?	Matches the previous element one or more times, but as few times as possible.	"be+?"	"be" in "been", "be" in "bent"
??	Matches the previous element zero or one time, but as few times as possible.	"rai??n"	"ran", "rain"
{ n }?	Matches the preceding element exactly n times.	",\d{3}?"	",043" in "1,043.6", ",876", ",543", and ",210" in "9,876,543,210"
{ n ,}?	Matches the previous element at least n times, but as few times as possible.	"\d{2,}?"	"166", "29", "1930"
{ n , m }?	Matches the previous element between n and m times, but as few times as possible.	"\d{3,5}?"	"166", "17668" "193", "024" in "193024"

Konstruksi referensi ( backreference construct)

Backreference construct memungkinkan sub-expression sebelumnya cocok untuk diidentifikasi kemudian dalam Regular Expression yang sama.

Tabel berikut mencantumkan konstruksi ini (mohon maaf menggunakan bahasa Inggris):

Backreference construct	Description	Pattern	Matches
\ number	Backreference. Matches the value of a numbered subexpression.	(\w)\1	"ee" in "seek"
\k< name >	Named backreference. Matches the value of a named expression.	(?< char>\w)\k< char>	"ee" in "seek"

Konstruksi alternatif (alternation construct)

Alternation construct mengubah Regular Expression untuk mengaktifkan baik / atau cocok. Tabel berikut mencantumkan Alternation construct (mohon maaf menggunakan bahasa Inggris):

Alternation construct	Description	Pattern	Matches
\|	Matches any one element separated by the vertical bar (\|) character.	th(e\|is\|at)	"the", "this" in "this is the day. "
(?( expression )yes \| no )	Matches yes if expression matches; otherwise, matches the optional no part. Expression is interpreted as a zero-width assertion.	(?(A)A\d{2}\b\|\b\d{3}\b)	"A10", "910" in "A10 C103 910"
(?( name )yes \| no )	Matches yes if the named capture name has a match; otherwise, matches the optional no.	(?< quoted>")?(?(quoted).+?"\|\S+\s)	Dogs.jpg, "Yiska playing.jpg" in "Dogs.jpg "Yiska playing.jpg""

Penggantian (substitution)

Substitution digunakan dalam pola-pola penggantian. Tabel berikut mencantumkan substitution (mohon maaf menggunakan bahasa Inggris):

Character	Description	Pattern	Replacement pattern	Input string	Result string
$number	Substitutes the substring matched by group number.	\b(\w+)(\s)(\w+)\b	$3$2$1	"one two"	"two one"
${name}	Substitutes the substring matched by the named groupname.	\b(?< word1>\w+)(\s)(?< word2>\w+)\b	${word2} ${word1}	"one two"	"two one"
$$	Substitutes a literal "$".	\b(\d+)\s?USD	$$$1	"103 USD"	"$103"
$&	Substitutes a copy of the whole match.	(\$(\d(\.+\d+)?){1})	**$&	"$1.30"	"$1.30"
$`	Substitutes all the text of the input string before the match.	B+	$`	"AABBCC"	"AAAACC"
$'	Substitutes all the text of the input string after the match.	B+	$'	"AABBCC"	"AACCCC"
$+	Substitutes the last group that was captured.	B+(C+)	$+	"AABBCCDD"	AACCDD
$_	Substitutes the entire input string.	B+	$_	"AABBCC"	"AAAABBCCCC"

Konstruksi miscellaneous (miscellaneous construct)

Berikut ini adalah berbagai miscellaneous construct (mohon maaf menggunakan bahasa Inggris):

Construct	Definition	Example
(?imnsx-imnsx)	Sets or disables options such as case insensitivity in the middle of a pattern.	\bA(?i)b\w+\b matches "ABA", "Able" in "ABA Able Act"
(?#comment)	Inline comment. The comment ends at the first closing parenthesis.	\bA(?#Matches words starting with A)\w+\b
# [to end of line]	X-mode comment. The comment starts at an unescaped # and continues to the end of the line.	(?x)\bA\w+\b#Matches words starting with A

Kelas REGEX

Kelas REGEX digunakan untuk merepresentasikan REGEX. Kelas REGEX memiliki metode yang umum digunakan sebagai berikut:

No.	Metode dan deskripsinya
1.	Public bool IsMatch (string input) Menunjukkan apakah Regular Expression yang ditentukan dalam konstruktor Regex menemukan korek api dalam input string tertentu
2.	public bool IsMatch( string input, int startat ) Menunjukkan apakah Regular Expression yang ditentukan dalam konstruktor Regex mencari kecocokan dalam input string tertentu, mulai posisi awal ditentukan dalam string.
3.	public static bool IsMatch( string input, string pattern ) Menunjukkan apakah Regular Expression yang ditetapkan menemukan kecocokan dalam input string tertentu.
4.	public MatchCollection Matches( string input ) Mencari masukan string tertentu untuk semua kemunculan dari Regular Expression.
5.	public string Replace( string input, string replacement ) Dalam input string tertentu, menggantikan semua string yang cocok dengan pola Regular Expression dengan string pengganti tertentu.
6.	public string[] Split( string input ) Membagi input string ke dalam array substring pada posisi yang didefinisikan oleh pola Regular Expression yang ditentukan dalam konstruktor Regex.

Contoh penerapan REGEX pada c#

Contoh 1

Contoh berikut sesuai dengan kata-kata yang dimulai dengan 'S':

using System;

using System.Text.RegularExpressions;

namespace RegExApplication

{

class Program

{

private static void showMatch(string text, string expr)

{

Console.WriteLine("The Expression: " + expr);

MatchCollection mc = Regex.Matches(text, expr);

foreach (Match m in mc)

{

Console.WriteLine(m);

}

static void Main(string[] args)

{

string str = "A Thousand Splendid Suns";

Console.WriteLine("Matching words that start with 'S': ");

showMatch(str, @"\bS\S*");

Console.ReadKey();

}

Ketika kode di atas dikompilasi dan dijalankan, itu menghasilkan berikut hasilnya:

Matching words that start with 'S':

The Expression: \bS\S*

Splendid

Suns

Contoh2

Contoh ini menggantikan tambahan ruang putih:

using System;

using System.Text.RegularExpressions;

namespace RegExApplication

   class Program

      static void Main(string[] args)

         string input = "Hello   World   ";

         string pattern = "\\s+";

         string replacement = " ";

         Regex rgx = new Regex(pattern);

         string result = rgx.Replace(input, replacement);

         Console.WriteLine("Original String: {0}", input);

         Console.WriteLine("Replacement String: {0}", result);

         Console.ReadKey();

Ketika kode di atas dikompilasi dan dijalankan, itu menghasilkan berikut hasilnya:

Original String: Hello   World

Replacement String: Hello World

Daftar Pustaka:

Anonim. (2013). C# - Regular Expressions [Online]. Tersedia: www.tutorialspoint.com/csharp/csharp_regular_expressions.htm [14 Mei 2013].

Untuk mendapat beberapa ilmu lainnya mengenai bahasa pemrograman pada C# bisa mengunduh lewat link-link berikut. Semoga sedikit ilmu dari saya bisa bermanfaat ^_^

4shared klik disini
Dropbox klik disini
Mediafire klik disini

Wassalamualaikum Warahmatullahi Wabarakatuh

This is the important blog

Minggu, 19 Mei 2013

C# - REGEX (Regular Expression)

Original String: Hello World

Replacement String: Hello World

Daftar Pustaka:

Anonim. (2013). C# - Regular Expressions [Online]. Tersedia: www.tutorialspoint.com/csharp/csharp_regular_expressions.htm [14 Mei 2013].

Untuk mendapat beberapa ilmu lainnya mengenai bahasa pemrograman pada C# bisa mengunduh lewat link-link berikut. Semoga sedikit ilmu dari saya bisa bermanfaat ^_^

Tidak ada komentar:

Posting Komentar