System.String C# Help

Before examining the other string classes, this section quickly reviews some of the available methods in the String class.

System. String is a class specifically designed to store a string and.allow a large number of operations on the string. In addition, due to the importance of this data type, C# has its own keyword and
associated syntax to make it particularly easy to manipulate strings using this class.

You can concatenate strings using operator overloads:

string messagel = “Hello”; // returns “Hello”
messagel += “. There”; // returns “Hello r, There”
string message2 = messagel + “!”; // returns “Hello. There!”

C# also allows extraction of a particular character using an indexer-like syntax:

char char4 = message[4]; II returns ‘a’. Note the char is zero-indexed.

This enables you to perform such common tasks as replacing characters, removing white space, and capitalization. The following table introduces the key methods.

Img

Please note that this table is not comprehensive but is intended to give you an idea of the features offered by strings.

Building Strings

 As you have seen, String is an extremely powerful class that implements a large number of very useful methods. However, the String class has a shortcoming that makes it very inefficient for making repeated modifications to a given string – it is actually an immutable data type, which means that once you initialize a string object, that string object can never change. The methods and operators that appear to modify the contents of a string actually create new strings, copying across the contents of the old string if necessary. For example, look at the following code:

string greetingText = ‘Hello from all the guys at Csharp Aid. “;
greetingText += ‘We do hope you Enjoy this book as much as we enjoyed writing it.·;

What happens when this code executes is this: first, an object of type System. String is created and initialized to hold the text Hello from all the guys at C sharp Aid .. Note the space after the period. When this happens, the .NET runtime allocates just enough memory in the string to hold this text (39 chars), and the variable greeting Text is set to refer to this string instance.

In the next line, syntactically it looks like more text is being added onto the string – though it is not. Instead, what happens is that a new string instance is created with just enough memory allocated to store the combined text – that’s 103 characters in total. The original text, Hello from all the people at CSharpAid., is copied into this new string instance along with the extra text, We do hope you enjoy this book as much as we enjoyed writing it .. Then, the address stored in the variable greeting Text is updated, so the variable correctly points to the new String object. The old String object is now un-referenced – there are no variables that refer to it – and so will be removed the next time the garbage collector comes along to clean out any unused objects in your application. By itself, that does not look too bad, but suppose that you wanted to encode that string by replacing each letter (not the punctuation) with the character that has an ASCII code further on in the alphabet, as part of some extremely simple encryption scheme. This would change the string to Ifmmp gspn bsun uif hvst bu xspy Qsftt. Xf ep ipqf zpv fokpz uijt cppl bt nvdi bt xf fokpzfe xsjujoh ju .. Several ways of doing this exist, but the simplest and (if you are restricting yourself to using the String class) almost certainly the most efficient way is to use the String. Replace () method, which replaces all occurrences of a given substring in a string with another substring. Using Replace (),the code to encode the text looks like this:

Img

 For simplicity, this code does Iwi torap Z to A or z to a. These letters get encoded to [ and {, respectively.

Here, the Replace () method works in a fully intelligent way, to the ,” tent that it won’t actually create a new string unless it actually makes changes to the old string. The original string contained 23 different lowercase characters and 3 different uppercase ones. The RE:place method will therefore have allocated a new string 26 times in total, with each new string storing 103 characters. That n.eans that because of the encryption process, there will be string objects capable of storing ,1 combined total of 2,1″78 characters now sitting on the heat waiting to be garbage-collected! Clearly, if you use strings to do text processing extensively, your applications will run into severe performance problems.

To address this kind of issue, Microsoft has supplied the System. Text. StringBuilder class. StringBuilder is not as powerful as String in terms of the number of methods it supports. The processing you can do on a StringBuilder as limited to substitutions and appending or removing text from strings. However, it works in a much more efficient way.

When you construct a string using the String class, just enough memory is allocated to hold the string. The StringBuilder, however, normally allocates more memory than is actually needed. You, as a developer, have the option to indicate how much memory the StringBuilder should allocate, but if you do not, the amount will default to some value that depends on the size of the string that the stringBuilder instance is initialized with The StringBuiLder class has two main properties:

  1.  Length, which indicates the length of the string that it actually contains  Capacity, which indicates the maximum length of the string in the memory allocation.

Any modifications to the string take place within the block of memory assigned to the StringBuilder instance, which makes appending substrings and replacing individual characters within strings very, efficient. Removing or inserting substrings is inevitably still inefficient because it means that the following part of the string has to be moved. Only if you perform some operation that exceeds the capacity of the string is it necessary to allocate new memory and possibly move the entire contained string. In adding extra capacity, based on our experiments the StringBuilder appears to double its capacity if it detects the capacity has been exceeded and no new value for the capacity has been set.

For example, if you use a StringBuilder object to construct the original greeting string, you might write this code:

StringBuilder greetingBuilder = new StringBuilder(‘Hello from all the guys at CSharpAid. 150); greetingBuilder.AppendFormat(‘We do hope you enjoy this booK as much as we enjoyed writing it’);

In order to use the StringBuilder class, you wiII need a System. Text reference in your code.

This code sets an initial capacity of 150 for the StringBuilder. It is always a good idea to set some capacity that covers the likely maximum length of a string, to ensure the StringBuilder does not need to relocate because its capacity was exceeded. Theoretically, you can set as large a number as you can pass in an int, although the system will probably complain that it does not have enough memory if you actually try to allocate the maximum of 2 billion characters (this is the theoretical maximum that a StringBuilder instance is in principle allowed to contain).

When the preceding code is executed, it first creates a StringBuilder object.

Img

Then, on calling the AppendFormat () method, the remaining text is placed in the empty space, without the need for more memory allocation. However, the real efficiency gain from using a StringBuilder comes when you are making repeated text substitutions. For example, if you try to encrypt the text in the same way as before, you can perform the entire encryption without allocating any more memory whatsoever:

Img

This code uses the StringBuilder. Replace () method, which does the same thing as String. Replace (), but without copying the string in the process. The total memory allocated to hold strings in the preceding code is 150 characters for the StringBuilder instance, as well as the memory allocated during the string operations performed internally in the final Console. Wri teLine () statement.

Normally, you will want to use StringBuilder to perform any manipulation of strings and String to store or display the final result.

StringBuilder Members

You have seen a demonstration of one constructor of StringBuilder, which takes an initial string and capacity as its parameters. There are·others. For example, you can supply only a string:

StringBuilder sb = new StringBuilder(‘Hello’);

Or you can create an empty StringBuilder with a given capacity:

StringBuilder sb = new StringBuilder(20);

Apart from the Length and Capacity properties, there is a read-only HaxCapacity property that indicates the limit to which a given StringBuilder instance is allowed to grow. By default, this is given by int. MaxValue (roughly 2 billion, as noted earlier), but you can set this value to something lower when you construct the StringBuilder object:

// This will both set initial capacity to 100, but the max will be 500.
// Hence, this StringBuilder can never grow to more than 500 characters, // otherwise it will raise exception if you try to do that.
StringBuilder sb = new StringBuilder(lOO, 500);

You can also explicitly set the capacity at any time, though an exception will be raised if you set it to a value less than the current length of the string or a value that exceeds the maximum capacity:

StringBuilder sb = new StringBuilder(‘Hello’);
sb.Capacity = 100;

The following table lists the main StringBuilder methods:

Img

Several overloads of many of these methods exist:

AppendFormat () is actually the method that is ultimately called when you call Console: Wri teLine (), which has responsibility for working out what all the format expressions like {o: D} should be replaced with. This method is examined in the next section.

There is no cast (either implicit or explicit) from StringBuilder to String. If you want to output thecontents of a StringBuilder as a String, you must use the ToString () method.

Now that you have been introduced to the StringBuilder class and have learned some of the ways in which you can use it to increase performance, you should be aware that this class will not always give you the increased performance that you are looking for. Basically, the StringBuilder class should be used when you are manipulating multiple strings. However, if you are just doing something as simple as concatenating two strings, you will find that System. String will be better-performing.

Format Strings

So far, a large number of classes and structs have been written for the code samples presented in this book, and they have normally implemented a ToString () method in order to be able to display the contents of a given variable. However, quite often users might want the contents of a variable to be displayed in different, often culture- and locale-dependent, ways. The .NET base class, System. DateTirne, provides the most obvious example of this. For example, you might want to display the same date as 10 June 2008,· 10 [un 2008, 6/10/08 (USA), 10/6/08 (UK), or 10.06.2008 (Germany) .

Similarly, the Vector struct in . ToString () method to display the vector in the format (4, 56, 8). There is, however, another very common way of writing vectors, in which this vector would appear as 4i + 56 j + 8k. If you want the classes that you write to be user-friendly, they need to support the facility to display their string representations in any of the formats that users are likely to want to use. The .NET runtime defines a standard way in which this should be done: the IFormattable interface. Showing how to add this important feature to your classes and structs is the subject of this section.

As you probably know, you need to specify the format in which you want a variable displayed when you call Console. writeLine ( ). Therefore, this section uses this method as an example, although most of the discussion applies to any situation in which you want to format a string. For example, if you want to display the value of a variable in a list box or text box, you will normally use the String. Format ( ) method to obtain the appropriate string representation of the variable. However, the actual format specifiers you use to request a particular format are identical to those passed to Console .WriteLine (). Hence, you will focus on Console. writeLine () as an example. You start by examining what actually happens when you supply a format string to a primitive type, and from this, you will see how you can plug format specifiers for your own classes and structs into the process.

double d = 13.45;
int i = 45;
Console.WriteLine(·The double is (O,10:E) and the int contains (1)·, d, i);

The format string &.itself consists mostly of the text to be displayed, but wherever there is a variable to be formatted, its index in the parameter list appears in braces. You might also include:

  1. a The number of characters to be occupied by the representation of the item, prefixed by a comma. A negative number indicates that the item should be left-justified, whereas a positive number indicates that it should be right-justified. If the item actually occupies more characters than have been requested, it will still appear in full.
  2.  A format specifier, preceded by a colon. This indicates how you want the item to be formatted. For example, you can indicate whether you want a number to be formatted as a currency or displayed in scientific notation.

The following table lists the common format specifiers for the numeric types,

Img

If you want an integer to be padded with zeros, you can use the format specifier 0 (zero) repeated as many times as the number length is required. For example, the format specifier 0000 will cause 3 to be displayed as 0003, and 99 to be displayed as 0099, and so on.

It is not possible to give a complete list because other data types can add their own specifiers, Showing how to define your own specifiers for your own classes is the aim of this section.

How the String Is Formatted

As an example of how strings are formatted, if you execute the following statement:

Console.WriteLine(“The double is (O,lO:E) and the int contains (I)”, d, i);

Console. WriteLine ( ) just passes the entire set of parameters to the static method, String. Format (). This is the same method that you would call if you wanted to format these values for use in a string to be displayed in a text box, for example. The implementation of the three-parameter overload of -Line () basically does this:

Img

The one-parameter overload of this method, which is in turn called in the preceding code sample, simply writes out the contents of the string it has been passed, without doing any further formatting on it.

String. Format () now needs to construct the final string by replacing each format specifier with a suitable string representation of the corresponding object. However, as you saw earlier, for this process of building up a string, you need a StringBuilder instance rather than a string instance. In this example, a StringBuilder instance is created and initialized with the first known portion of the string, the text “The double is”. Next, the StringBuilder .AppendFormat () method is called, passing in the first format specifier, {O, 10: E}, as well as the associated object, double, in order to add the string representation of this object to the string object being constructed. This process continues with StringBuilder .Append () and StringBuilder .AppendFormat () being called repeatedly until the entire formatted string has been obtained.

Now comes the interesting part: StringBuilder .AppendFormat () has to figure out how to format the object. First, it probes the object to find out whether it implements an interface in the system namespace called IFormattable. You can determine this quite simply by trying to cast an object to this interface and seeing whether the cast succeeds, or by using the C# is keyword. If this test fails, AppendFormat () calls the object’s TOString () method, which all objects either inherit from System. Object or override. This is exactly what happens here because none of the classes written so far has implemented this interface. That is why the overrides of Object. ToString () have been sufficient to allow the structs and classes from earlier chapters such as Vector to get displayed in Console. Wri teLine () statements.

However, all of the predefined primitive numeric types do implement this interface, which means that for those types, and in particular for double and int in the example, the basic ToString () method inherited from System. Object will not be called. To understand what happens instead, you need to examine the IFormattable interface.

IFormattable defines just one method, which is also called ToString (). However, this method takes two parameters as opposed to the System. Object version, which doesn’t take any parameters. The . following code shows the definition of IFormattable:

interface IFormattable

{
string ToString(string format, IFormatProvider formatProvider);

}

The first parameter that this overload of ToString () expects is a string that specifies the requested format. In other words, it is the specifier portion of the string that appears inside the braces ( ( l) in the string originally passed to Console.WriteLine () or String. Format ( ) . For example, in the example the original statement was:

Console.WriteLine(“The double is {O,10:El and the int contains {ll”, d, i);

Hence, when evaluating the first specifier, {O, lO : E), this overload will be called against the double variable, d, and the first parameter passed to it will be E. StringBuilder. AppendFormat () will pass in here the text that.appears after the colon in the appropriate format specifier from the original string.

We won’t worry about the second .ToString () parameter in this book. It is a reference to an object that implements the IFormatProvider interface. This interface gives further information that ToString () might need to conslder when formatting the object, such as culture-specific details (a .NET culture is similar to a Windows locale; if you are formatting currencies or dates, you need this information). If you are calling this ToString () overload directly from your source code, you might want to supply such an object. However, StringBuilder .AppendForma t () passes in null for this parameter. If format Provider is null, then ToString () is expected to use the culture specified in the system settings.

Getting back to the example, the first item you want to format is a double, for which you are requesting exponential notation, with the format specifier E. The StringBuilder . AppendFormat () method establishes that the double does implement IFormat table, and will therefore call the two-parameter ToString () overload, passing it the string E for the first parameter and null for the second parameter. It is now up to the double’s implementation of this method to return the string representation of the double in the appropriate format, taking into account the requested format and the current culture. ‘StringBuilder . AppendForma i: () will then sort out padding the returned string with spaces, if necessary, to fill the 10 characters the format string specified.

The next object to be formatted is an int, for which you are not requesting any particular format (the format specifier was simply (I}). With no format requested, StringBuilder .AppendFormat () passes in a null reference for the format string. The two-parameter overload of int . ToString () is expected to respond appropriately. No format has been specifically requested; therefore, it will call the no-parameter ToString t r method.

This entire string formatting process is summarized here.

Img

The Formattable Vector Example

Now that you ~ow how format strings are constructed, in this section you extend the Vector example , so that you can format vectors in a variety of ways. You can download the code for this example from www.csharpaid.com. With your new knowledge of the principles involved now in hand, you will discover that the actual coding is quite simple. All you need to do is implement IFormattable and supply an implementation of the ToString () overload defined by that interface.

The format specifiers you are going to support are:

  1.  N– Should be interpreted as a request to supply a quantity known as the Normal the Vector. This is just the sum of squares of its components, which for mathematics buffs happens to be equal to the square of the length of the Vector, and is usually displayed between double vertical bars, like this: 1134. 5 1I·
  2.  VE– Should be interpreted as a request to display each component in scientific format, just as the specifier E applied to a double indicates (2. 3E+Ol, 4. 5E+02, 1.OE+OO
  3.  IJK – Should be interpreted as a request to display the vector in the form 23i + 450j + lk.  Anything else should simply return the default representation of the Vector (23, 450, 1. 0) .

To keep things simple, you are not going to implement any option to display the-vector in combined IJK and scientific format. You will, however, make sure you test the specifier in a case-insensitive way, so that you allow ijk instead of IJK. Note that it is entirely up to you which strings you use to indicate the format specifiers.

To achieve this.you first modify the declaration of Vector so it implements IFormattable:

Img

That is all you have to do! Notice how you take the precaution of checking whether format is null before you call any methods against this parameter – you want this method to be as robust as reasonably possible. The format specifiers for all the primitive types are case-insensitive, so that is the behavior that other developers are going to expect from your class, too. For the format specifier VE, you need each component to be formatted in scientific notation, so you just use String. Format () again to achieve this. The fields x, y, and z are al1 doubles. For the case of the IJK format specifier, there are quite a few substrings to be added to the string, so you use a StringBuilder object to improve performance.

For completeness, you also reproduce the no-parameter ToString () overload developed earlier:

Img

Finally, you need to add a Norm () method. that computes the square (norm) of the vector because you didn’t actually supply this method when you developed the Vector struct:

Img

Now you can try your formattable vector with some suitable test code:

Img

The result of running this sample is this:

Img

This shows that your custom specifiers are being picked up correctly.

Posted on October 29, 2015 in Strings and Regular Expressions

Share the Story

Back to Top