String(JDK1.8) 源码阅读记录

String

  • 在 Java 中字符串属于对象。
  • Java 提供了 String 类来创建和操作字符串。

#定义
使用了final ,说明该类不能被继承。同时还实现了:

  • java.io.Serializable
  • Comparable
  • CharSequence
    1
    2
    public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence { }

属性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

/** The value is used for character storage.
* String就是用char[]实现的。保存的
*/
private final char value[];

/** Cache the hash code for the string
* hash 值
*/
private int hash; // Default to 0

/** use serialVersionUID from JDK 1.0.2 for interoperability
* Java的序列化机制是通过在运行时判断类的serialVersionUID来验证版本一致性的。
*/
private static final long serialVersionUID = -6849794470754667710L;

/**
* Class String is special cased within the Serialization Stream Protocol.
* 类字符串在序列化流协议中是特殊的。
* A String instance is written into an ObjectOutputStream according to
* 将字符串实例写入ObjectOutputStream中,根据 a标签
* <a href="{@docRoot}/../platform/serialization/spec/output.html">
* Object Serialization Specification, Section 6.2, "Stream Elements"</a>
*/
private static final ObjectStreamField[] serialPersistentFields =
new ObjectStreamField[0];

构造方法

String 的构造方法大概有十几种,其中最常用的如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
/**
* 根据字符串创建字符串对象
* Initializes a newly created {@code String} object so that it represents
* the same sequence of characters as the argument; in other words, the
* newly created string is a copy of the argument string. Unless an
* explicit copy of {@code original} is needed, use of this constructor is
* unnecessary since Strings are immutable.
*
* @param original
* A {@code String}
*/
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}

/**
* 根据byte数组创建字符串对象
* byte[] to String 是根据系统的编码来的,但是也可以自己指定编码
* Constructs a new {@code String} by decoding the specified array of bytes
* using the platform's default charset. The length of the new {@code
* String} is a function of the charset, and hence may not be equal to the
* length of the byte array.
*
* <p> The behavior of this constructor when the given bytes are not valid
* in the default charset is unspecified. The {@link
* java.nio.charset.CharsetDecoder} class should be used when more control
* over the decoding process is required.
*
* @param bytes The bytes to be decoded into characters
* @since JDK1.1
*/
public String(byte bytes[]) {
this(bytes, 0, bytes.length);
}

/**
* 在Java中,String实例中保存有一个char[]字符数组,char[]字符数组是以unicode码来存储的,
* String 和 char 为内存形式,byte是网络传输或存储的序列化形式。
* 所以在很多传输和存储的过程中需要将byte[]数组和String进行相互转化。
* 所以,String提供了一系列重载的构造方法来将一个字符数组转化成String,
* 提到byte[]和String之间的相互转换就不得不关注编码问题。
* 例如:
* public String(byte bytes[], int offset, int length, Charset charset) {}
* String(byte bytes[], String charsetName)
* String(byte bytes[], int offset, int length, String charsetName)
* and so on
* String(byte[] bytes, Charset charset)是指通过charset来解码指定的byte数组,
* 将其解码成unicode的char[]数组,够造成新的String。
*
* 下面这个构造方法可以指定字节数组的编码
* Constructs a new {@code String} by decoding the specified array of
* bytes using the specified {@linkplain java.nio.charset.Charset charset}.
* The length of the new {@code String} is a function of the charset, and
* hence may not be equal to the length of the byte array.
*
* <p> This method always replaces malformed-input and unmappable-character
* sequences with this charset's default replacement string. The {@link
* java.nio.charset.CharsetDecoder} class should be used when more control
* over the decoding process is required.
*
* @param bytes
* The bytes to be decoded into characters
*
* @param charset
* The {@linkplain java.nio.charset.Charset charset} to be used to
* decode the {@code bytes}
*
* @since 1.6
*/
public String(byte bytes[], Charset charset) {
this(bytes, 0, bytes.length, charset);
}

/**
* 根据char数组
* Allocates a new {@code String} so that it represents the sequence of
* characters currently contained in the character array argument. The
* contents of the character array are copied; subsequent modification of
* the character array does not affect the newly created string.
*
* @param value
* The initial value of the string
*/
public String(char value[]) {
this.value = Arrays.copyOf(value, value.length);

/**
* 根据 StringBuffer 创建 String对象
* Allocates a new string that contains the sequence of characters
* currently contained in the string buffer argument. The contents of the
* string buffer are copied; subsequent modification of the string buffer
* does not affect the newly created string.
*
* @param buffer
* A {@code StringBuffer}
*/
public String(StringBuffer buffer) {
synchronized(buffer) {
this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
}
}

/**
* 根据 StringBuilder 创建 String对象
* Allocates a new string that contains the sequence of characters
* currently contained in the string builder argument. The contents of the
* string builder are copied; subsequent modification of the string builder
* does not affect the newly created string.
*
* <p> This constructor is provided to ease migration to {@code
* StringBuilder}. Obtaining a string from a string builder via the {@code
* toString} method is likely to run faster and is generally preferred.
*
* @param builder
* A {@code StringBuilder}
*
* @since 1.5
*/
public String(StringBuilder builder) {
this.value = Arrays.copyOf(builder.getValue(), builder.length());
}

/*
* 这是一个受保护构造方法,因为不能继承,所以内部使用
* 第二个属性基本没有用,只能是true
* 从代码中可以看出来是直接引用,而不是新建一个,为了提高性能,节省内存等。
* 保护的原因也是为了保证字符串不可修改。
* Package private constructor which shares value array for speed.
* this constructor is always expected to be called with share==true.
* a separate constructor is needed because we already have a public
* String(char[]) constructor that makes a copy of the given char[].
*/
String(char[] value, boolean share) {
// assert share : "unshared not supported";
this.value = value;
}

常用的方法

getByte

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
* 将字符串转成可用的 byte数组
* 在通信的比较多,例如 网络中传输、8583报文、socket通信
* 要想不乱码,就得搞清楚通信双方所使用的字节编码!!!
* Encodes this {@code String} into a sequence of bytes using the named
* charset, storing the result into a new byte array.
*
* <p> The behavior of this method when this string cannot be encoded in
* the given charset is unspecified. The {@link
* java.nio.charset.CharsetEncoder} class should be used when more control
* over the encoding process is required.
*
* @param charsetName
* The name of a supported {@linkplain java.nio.charset.Charset
* charset}
*
* @return The resultant byte array
*
* @throws UnsupportedEncodingException
* If the named charset is not supported
*
* @since JDK1.1
*/
public byte[] getBytes(String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null) throw new NullPointerException();
return StringCoding.encode(charsetName, value, 0, value.length);
}

/**
* 同上
* Encodes this {@code String} into a sequence of bytes using the given
* {@linkplain java.nio.charset.Charset charset}, storing the result into a
* new byte array.
*
* <p> This method always replaces malformed-input and unmappable-character
* sequences with this charset's default replacement byte array. The
* {@link java.nio.charset.CharsetEncoder} class should be used when more
* control over the encoding process is required.
*
* @param charset
* The {@linkplain java.nio.charset.Charset} to be used to encode
* the {@code String}
*
* @return The resultant byte array
*
* @since 1.6
*/
public byte[] getBytes(Charset charset) {
if (charset == null) throw new NullPointerException();
return StringCoding.encode(charset, value, 0, value.length);
}
/**
* 将使用系统默认编码。
* 要注意的,部署的时候容易出错的地方就是这里,
* windows 环境和linux环境字节编码不一样.所以建议指定编码方法
* Encodes this {@code String} into a sequence of bytes using the
* platform's default charset, storing the result into a new byte array.
*
* <p> The behavior of this method when this string cannot be encoded in
* the default charset is unspecified. The {@link
* java.nio.charset.CharsetEncoder} class should be used when more control
* over the encoding process is required.
*
* @return The resultant byte array
*
* @since JDK1.1
*/
public byte[] getBytes() {
return StringCoding.encode(value, 0, value.length);
}

hashCode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* hash算法
* hashCode可以保证相同的字符串的hash值肯定相同,
* 但是,hash值相同并不一定是value值就相同。
* 所以要保证两个字符串相等还得用上 equals
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;

for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}

equals

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/**
*
* 在hashmap中
* 一定要重写 equals 和 hachcode
* 才能保证是同一个字符串
* 正因为String 重写了我们才能愉快的使用字符串作为key
*/
public boolean equals(Object anObject) {
/** 首先判断是不是自己!*/
if (this == anObject) {
return true;
}
/** 在判断是不是String类型 */
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
/** 判断长度 */
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
/** 判断字节 */
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}

substring

这个方法在JDK1.6(含1.6)以前和JDK1.7之后(含1.7)有了不一样的变化

JDK1.6 substring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/** 
* 仍然创建新的字符串但是 旧字符串还在 只是新的引用了旧的一部分
* 但旧字符串很大的时候,因为新的引用一小部分而无法回收会导致内存泄漏
* 一般使用加上一个空的字符串来生成新的解决这个问题
* str = str.substring(x, y) + ""
*/
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}

public String substring(int beginIndex, int endIndex) {
/** 校验数组溢出 */
return new String(offset + beginIndex, endIndex - beginIndex, value);
}
  • 内存泄露:在计算机科学中,内存泄漏指由于疏忽或错误造成程序未能释放已经不再使用的内存。 内存泄漏并非指内存在物理上的消失,而是应用程序分配某段内存后,由于设计错误,导致在释放该段内存之前就失去了对该段内存的控制,从而造成了内存的浪费。

JDK1.8 substring

jdk1.7之后直接新建了一个字符串 。虽然增加了内存,但是解决了内存泄漏问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
public String substring(int beginIndex, int endIndex) {

if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > value.length) {
throw new StringIndexOutOfBoundsException(endIndex);
}
int subLen = endIndex - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return ((beginIndex == 0) && (endIndex == value.length)) ? this
: new String(value, beginIndex, subLen);

public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= value.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}

valueOf

1
2
3
4
5
6
7
8
9
10
/** 调用对象自己的toString方法 */
public static String valueOf(Object obj) {
return (obj == null) ? "null" : obj.toString();
}
public static String valueOf(char data[]) {
return new String(data);
}
public static String valueOf(char data[], int offset, int count) {
return new String(data, offset, count);
}

String + 号重载

1
2
3
4
5
6
String str = "abc";
String str1= str + "def";

/** 反编译之后 */
String str = "abc";
String str1= (new StringBuilder(String.valueOf(str))).append("def").toString();

spilt

按照字符regex将字符串分成limit份。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};

// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));

// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}

按照字符regex将字符串分割

1
2
3
4
/** 直接调用 split(String regex, int limit) limit 为 零 */
public String[] split(String regex) {
return split(regex, 0);
}

equalsIgnoreCase

1
2
3
4
5
6
public boolean equalsIgnoreCase(String anotherString) {
return (this == anotherString) ? true
: (anotherString != null)
&& (anotherString.value.length == value.length)
&& regionMatches(true, 0, anotherString, 0, value.length);
}

三目运算符加 && 代替 多个if

replaceFirst、replaceAll、replace

1
2
3
String replaceFirst(String regex, String replacement)
String replaceAll(String regex, String replacement)
String replace(CharSequence target, CharSequence replacement)
  • replace的参数是char和CharSequence,即可以支持字符的替换,也支持字符串的替换
  • replaceAll和replaceFirst的参数是regex,即基于规则表达式的替换,replace只要有符合就替换
  • replaceFirst(),只替换第一次出现的字符串;

其他方法

String 类中还有很多方法。例如:

  • public int length(){}
    返回字符串长度
  • public boolean isEmpty() { }
    返回字符串是否为空
  • public char charAt(int index) {}
    返回字符串中第(index+1)个字符
  • public char[] toCharArray() {}
    转化成字符数组
  • public String trim(){}
    去掉两端空格
  • public String toUpperCase(){}
    转化为大写
  • public String toLowerCase(){}
    转化为小写
  • public String concat(String str) {}
    拼接字符串
  • public boolean matches(String regex){}
    判断字符串是否匹配给定的regex正则表达式
  • public boolean contains(CharSequence s)
    判断字符串是否包含字符序列s

Powered by Hexo

Copyright © 2016 - 2019 When I think of you, I smile. All Rights Reserved.

UV : | PV :