|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.commons.csv.CSVParser
public class CSVParser
Parses CSV files according to the specified configuration.
Because CSV appears in many different dialects, the parser supports many
configuration settings by allowing the specification of a CSVStrategy.
Parsing of a csv-string having tabs as separators, '"' as an optional value encapsulator, and comments starting with '#':
String[][] data =
(new CSVParser(new StringReader("a\tb\nc\td"), new CSVStrategy('\t','"','#'))).getAllValues();
Parsing of a csv-string in Excel CSV format
String[][] data =
(new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
Internal parser state is completely covered by the strategy and the reader-state.
see package documentation for more details
| Nested Class Summary | |
|---|---|
(package private) static class |
CSVParser.Token
Token is an internal token representation. |
| Field Summary | |
|---|---|
private CharBuffer |
code
|
private static java.lang.String[] |
EMPTY_STRING_ARRAY
Immutable empty String array. |
private ExtendedBufferedReader |
in
|
private static int |
INITIAL_TOKEN_LENGTH
length of the initial token (content-)buffer |
private java.util.ArrayList |
record
A record buffer for getLine(). |
private CSVParser.Token |
reusableToken
|
private CSVStrategy |
strategy
|
protected static int |
TT_EOF
Token (which can have content) when end of file is reached. |
protected static int |
TT_EORECORD
Token with content when end of a line is reached. |
protected static int |
TT_INVALID
Token has no valid content, i.e. |
protected static int |
TT_TOKEN
Token with content, at beginning or in the middle of a line. |
private CharBuffer |
wsBuf
|
| Constructor Summary | |
|---|---|
CSVParser(java.io.InputStream input)
Deprecated. use CSVParser(Reader). |
|
CSVParser(java.io.Reader input)
CSV parser using the default CSVStrategy. |
|
CSVParser(java.io.Reader input,
char delimiter)
Deprecated. use CSVParser(Reader,CSVStrategy). |
|
CSVParser(java.io.Reader input,
char delimiter,
char encapsulator,
char commentStart)
Deprecated. use CSVParser(Reader,CSVStrategy). |
|
CSVParser(java.io.Reader input,
CSVStrategy strategy)
Customized CSV parser using the given CSVStrategy |
|
| Method Summary | |
|---|---|
private CSVParser.Token |
encapsulatedTokenLexer(CSVParser.Token tkn,
int c)
An encapsulated token lexer Encapsulated tokens are surrounded by the given encapsulating-string. |
java.lang.String[][] |
getAllValues()
Parses the CSV according to the given strategy and returns the content as an array of records (whereas records are arrays of single values). |
java.lang.String[] |
getLine()
Parses from the current point in the stream til the end of the current line. |
int |
getLineNumber()
Returns the current line number in the input stream. |
CSVStrategy |
getStrategy()
Obtain the specified CSV Strategy |
private boolean |
isEndOfFile(int c)
|
private boolean |
isEndOfLine(int c)
Greedy - accepts \n and \r\n This checker consumes silently the second control-character... |
private boolean |
isWhitespace(int c)
|
protected CSVParser.Token |
nextToken()
Convenience method for nextToken(null). |
protected CSVParser.Token |
nextToken(CSVParser.Token tkn)
Returns the next token. |
java.lang.String |
nextValue()
Parses the CSV according to the given strategy and returns the next csv-value as string. |
private int |
readEscape(int c)
|
CSVParser |
setStrategy(CSVStrategy strategy)
Deprecated. the strategy should be set in the constructor CSVParser(Reader,CSVStrategy). |
private CSVParser.Token |
simpleTokenLexer(CSVParser.Token tkn,
int c)
A simple token lexer Simple token are tokens which are not surrounded by encapsulators. |
protected int |
unicodeEscapeLexer(int c)
Decodes Unicode escapes. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private static final int INITIAL_TOKEN_LENGTH
protected static final int TT_INVALID
protected static final int TT_TOKEN
protected static final int TT_EOF
protected static final int TT_EORECORD
private static final java.lang.String[] EMPTY_STRING_ARRAY
private final ExtendedBufferedReader in
private CSVStrategy strategy
private final java.util.ArrayList record
private final CSVParser.Token reusableToken
private final CharBuffer wsBuf
private final CharBuffer code
| Constructor Detail |
|---|
public CSVParser(java.io.InputStream input)
CSVParser(Reader).
CSVStrategy.
input - an InputStream containing "csv-formatted" streampublic CSVParser(java.io.Reader input)
CSVStrategy.
input - a Reader containing "csv-formatted" input
public CSVParser(java.io.Reader input,
char delimiter)
CSVParser(Reader,CSVStrategy).
CSVStrategy
except for the delimiter setting.
input - a Reader based on "csv-formatted" inputdelimiter - a Char used for value separation
public CSVParser(java.io.Reader input,
char delimiter,
char encapsulator,
char commentStart)
CSVParser(Reader,CSVStrategy).
input - a Reader based on "csv-formatted" inputdelimiter - a Char used for value separationencapsulator - a Char used as value encapsulation markercommentStart - a Char used for comment identification
public CSVParser(java.io.Reader input,
CSVStrategy strategy)
CSVStrategy
input - a Reader containing "csv-formatted" inputstrategy - the CSVStrategy used for CSV parsing| Method Detail |
|---|
public java.lang.String[][] getAllValues()
throws java.io.IOException
The returned content starts at the current parse-position in the stream.
java.io.IOException - on parse error or input read-failure
public java.lang.String nextValue()
throws java.io.IOException
java.io.IOException - on parse error or input read-failure
public java.lang.String[] getLine()
throws java.io.IOException
java.io.IOException - on parse error or input read-failurepublic int getLineNumber()
protected CSVParser.Token nextToken()
throws java.io.IOException
nextToken(null).
java.io.IOException
protected CSVParser.Token nextToken(CSVParser.Token tkn)
throws java.io.IOException
tkn - an existing Token object to reuse. The caller is responsible to initialize the
Token.
java.io.IOException - on stream access error
private CSVParser.Token simpleTokenLexer(CSVParser.Token tkn,
int c)
throws java.io.IOException
tkn - the current tokenc - the current character
java.io.IOException - on stream access error
private CSVParser.Token encapsulatedTokenLexer(CSVParser.Token tkn,
int c)
throws java.io.IOException
tkn - the current tokenc - the current character
java.io.IOException - on invalid state
protected int unicodeEscapeLexer(int c)
throws java.io.IOException
c - current char which is discarded because it's the "\\" of "\\uXXXX"
java.io.IOException - on wrong unicode escape sequence or read error
private int readEscape(int c)
throws java.io.IOException
java.io.IOExceptionpublic CSVParser setStrategy(CSVStrategy strategy)
CSVParser(Reader,CSVStrategy).
public CSVStrategy getStrategy()
private boolean isWhitespace(int c)
private boolean isEndOfLine(int c)
throws java.io.IOException
java.io.IOExceptionprivate boolean isEndOfFile(int c)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||