Reading CSV file in Java

July 07, 2020 No comments Java IO Java CSV File

1. Introduction

The CSV (Comma Separated Values) is a text-based format that separates fields with comma delimiter ,. Line in CSV files represents one record that usually ends with a line break. In this article, we are going to present several ways to read CSV file using plain Java, OpenCSV library, and dedicated solution based on annotations and reflection.

2. Example CSV file

CSV file used in this tutorial contains a list of usernames with additional information:

Username,Identifier,First name,Last name
booker12,9012,Rachel,Booker
grey07,2070,Laura,Grey
johnson81,4081,Craig,Johnson
jenkins46,9346,Mary,Jenkins
smith79,5079,Jamie,Smith

This is the CSV file viewed as spread sheet:

Username Identifier First name Last name
booker12 9012 Rachel Booker
grey07 2070 Laura Grey
johnson81 4081 Craig Johnson
jenkins46 9346 Mary Jenkins
smith79 5079 Jamie Smith

2. Read CSV file line by line using BufferedReader

Actually you can use any method available in plain Java to read a file line by line and then split fields using String.split(...) method to parse it.

package com.frontbackend.java.io.csv;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class ReadCSVUsingBufferedReader {

    private final static String COMMA_DELIMITER = ",";

    public static void main(String[] args) throws IOException {
        try (BufferedReader br = new BufferedReader(new FileReader("/tmp/username.csv"))) {

            List<List<String>> result = new ArrayList<>();
            String line;
            while ((line = br.readLine()) != null) {
                String[] values = line.split(COMMA_DELIMITER);
                result.add(Arrays.asList(values));
            }

            System.out.println(result);
        }
    }
}

In this example, we used BufferedReader that allows us to read a file line by line and also supports buffering. In the first step we read a line from the CSV file, and then we split it into tokens using String.split(",") method. The result object contains a list of lines with a list of values.

The output:

[[Username, Identifier, First name, Last name], [booker12, 9012, Rachel, Booker], [grey07, 2070, Laura, Grey], [johnson81, 4081, Craig, Johnson], [jenkins46, 9346, Mary, Jenkins], [smith79, 5079, Jamie, Smith]]

3. Using Scanner to read CSV file

The Scanner class can parse primitive types and strings using regular expressions. In the following example we use it to retrieve lines of the CSV file and parse it using comma delimiter:

package com.frontbackend.java.io.csv;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

public class ReadCSVUsingScanner {

    private final static String COMMA_DELIMITER = ",";

    public static void main(String[] args) throws IOException {
        List<List<String>> result = new ArrayList<>();

        try (Scanner scanner = new Scanner(new File("/tmp/username.csv"))) {
            while (scanner.hasNextLine()) {
                String line = scanner.nextLine();

                List<String> fields = new ArrayList<>();
                try (Scanner rowScanner = new Scanner(line)) {
                    rowScanner.useDelimiter(COMMA_DELIMITER);
                    while (rowScanner.hasNext()) {
                        fields.add(rowScanner.next());
                    }
                }

                result.add(fields);
            }
        }

        System.out.println(result);
    }
}

As you can see this is a little bit complex way to parse a CSV file.

4. Using JDK 8 Files.readAllLines(...) method

Let's check solution available in JDK 8 using Files.readAllLines(...) method.

package com.frontbackend.java.io.csv;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class ReadCSVUsingFiles {

    private final static String COMMA_DELIMITER = ",";

    public static void main(String[] args) throws IOException {
        List<List<String>> result = Files.readAllLines(Paths.get("/tmp/username.csv"))
                                         .stream()
                                         .map(line -> Arrays.asList(line.split(COMMA_DELIMITER)))
                                         .collect(Collectors.toList());
        System.out.println(result);
    }
}

In this example, we make use of the Files class available in java.nio package and Java streams to map lines into splitted fields.

5. Using OpenCSV library

The OpenCSV is a library dedicated to working with CSV files. We can use it to read the CSV and parse it into Java structure:

package com.frontbackend.java.io.csv;

import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import com.opencsv.CSVReader;

public class ReadCSVUsingOpenCSV {

    public static void main(String[] args) throws IOException {
        List<List<String>> result = new ArrayList<>();
        try (CSVReader csvReader = new CSVReader(new FileReader("/tmp/username.csv"))) {
            String[] values;
            while ((values = csvReader.readNext()) != null) {
                result.add(Arrays.asList(values));
            }
        }

        System.out.println(result);
    }
}

In this example, we used the readNext() method in CSVReader to read the records from the file.

Check the following link to list all available versions of the OpenCSV library.

6. Reading CSV file into Java model

If we want to parse CSV files directly into Java POJO classes with fields that represent columns, we must implement a dedicated parser.

First, let's start with the dedicated annotation called CSVField that will be used to mark fields in POJO object:

package com.frontbackend.java.io.csv.model;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.FIELD)
public @interface CSVField {

    CSVColumn column() default CSVColumn.DEFAULT;
}

Columns from our CSV file will be represented by CSVColumn enum object:

package com.frontbackend.java.io.csv.model;

import lombok.AllArgsConstructor;
import lombok.Getter;

@AllArgsConstructor
public enum CSVColumn {

    USERNAME("Username"),
    IDENTIFIER("Identifier"),
    FIRST_NAME("First name"),
    LAST_NAME("Last name"),
    DEFAULT("");

    @Getter
    private final String column;
}

In POJO class we need to set annotations for all the fields that have a representation in the CSV file:

package com.frontbackend.java.io.csv.model;

import lombok.Getter;
import lombok.Setter;
import lombok.ToString;

@ToString
@Getter
@Setter
public class User {

    @CSVField(column = CSVColumn.USERNAME)
    private String username;

    @CSVField(column = CSVColumn.IDENTIFIER)
    private String id;

    @CSVField(column = CSVColumn.FIRST_NAME)
    private String firstName;

    @CSVField(column = CSVColumn.LAST_NAME)
    private String lastName;

}

Magic happen in CSVAbstractParser class:

  • 1) first we parse CSV file using Files.readAllLines(...) method,
  • 2) we assume that first row in CSV file will be the header,
  • 3) we search for the headers and fields that represent them in POJO classes,
  • 4) we iterate over list of rows and set values for all fields annotated with CSVField,
  • 5) parser returns a list of POJO objects.
package com.frontbackend.java.io.csv.model;

import java.io.IOException;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public abstract class CSVAbstractParser<T> {

    private final static int FIRST_LINE_INDEX = 0;

    private final static String COMMA_DELIMITER = ",";

    public List<T> parse(Path path, Class<T> cls) throws IOException {
        List<List<String>> lines = Files.readAllLines(path)
                                        .stream()
                                        .map(line -> Arrays.asList(line.split(COMMA_DELIMITER)))
                                        .collect(Collectors.toList());
        if (lines.size() > 0) {
            Map<Integer, Field> header = getHeaders(cls, lines.get(FIRST_LINE_INDEX));
            return lines.subList(FIRST_LINE_INDEX + 1, lines.size() - 1)
                        .stream()
                        .map(line -> getT(cls, header, line))
                        .collect(Collectors.toList());
        }

        return Collections.emptyList();
    }

    private T getT(Class<T> cls, Map<Integer, Field> header, List<String> line) {
        try {
            T obj = cls.getDeclaredConstructor()
                       .newInstance();

            for (int index = 0; index < line.size(); index++) {
                Field field = header.get(index);
                String fieldName = field.getName();
                Optional<Method> setter = getSetterMethod(obj, fieldName);

                if (setter.isPresent()) {
                    Method setMethod = setter.get();
                    setMethod.invoke(obj, line.get(index));
                }
            }

            return obj;

        } catch (Exception e) {
            e.printStackTrace();
        }

        return null;
    }

    private Optional<Method> getSetterMethod(T obj, String fieldName) {
        return Arrays.stream(obj.getClass()
                                .getDeclaredMethods())
                     .filter(method -> method.getName()
                                             .equals("set" + fieldName.substring(0, 1)
                                                                      .toUpperCase()
                                                     + fieldName.substring(1)))
                     .findFirst();
    }

    private Map<Integer, Field> getHeaders(Class<T> cls, List<String> firstLine) {
        final Map<Integer, Field> map = new HashMap<>();

        Stream.of(cls.getDeclaredFields())
              .filter(field -> field.getAnnotation(CSVField.class) != null)
              .forEach(field -> {
                  CSVField csvField = field.getAnnotation(CSVField.class);
                  String columnName = csvField.column()
                                              .getColumn()
                                              .trim();
                  int columnIndex = firstLine.indexOf(columnName);
                  map.put(columnIndex, field);
              });

        return map;
    }
}

The CSV parser dedicated for User entity:

package com.frontbackend.java.io.csv.model;

public class CSVUserParser extends CSVAbstractParser<User> {
}

To parse CSV file we need to create an instance of CSVUserParser and run parse(...) method with the path to the CSV file, and class that will be our row representation:

package com.frontbackend.java.io.csv.model;

import java.io.IOException;
import java.nio.file.Paths;
import java.util.List;

public class ReadCSVIntoModel {

    public static void main(String[] args) throws IOException {
        CSVUserParser csvUserParser = new CSVUserParser();
        List<User> users = csvUserParser.parse(Paths.get("/tmp/username.csv"), User.class);
        System.out.println(users);
    }
}

The output:

[User(username=booker12, id=9012, firstName=Rachel, lastName=Booker), User(username=grey07, id=2070, firstName=Laura, lastName=Grey), User(username=johnson81, id=4081, firstName=Craig, lastName=Johnson), User(username=jenkins46, id=9346, firstName=Mary, lastName=Jenkins)]

7. Conclusion

In this article, we showcased several ways to read and parse the CSV file in Java. We presented a plain Java solution and an external library solution using OpenCSV. In case you need to convert CSV file to POJO objects you could use our dedicated solution described in point 6.

As usual, the code used in this tutorial is available under GitHub.

{{ message }}

{{ 'Comments are closed.' | trans }}