Getting Started in Automation – Reading from Excel

If you’re generating your own data for import into your scripts, then I’d encourage you to stick with CSV for simplicity. Sometimes however, you’re provided with that data, and it’s in Excel format. Since the aim of automation is to save time, then adding a manual step of having to convert from Excel to CSV every time, defeats what we’re trying to achieve.
Of course, there may be other valid reasons for using Excel, and the dataset might not be for the exclusive use of your automation scripts.
The approach I take for reading Excel files is a little different to that for CSVs, as from the outset I expect that I don’t know what the size or shape of the dataset will be, and so the first thing my function does, is to try and determine that.
Excel also holds more than just plain text, too. So, I must check the cell type, before reading it into memory.

I do make some assumptions about the data though:

  • First off, I assume that the data is rectangular, i.e. however many columns are in the first row, is the same throughout the dataset. The code could be adapted to handle non-rectangular datasets, but I’ve not yet had to deal with one, and so not yet seen enough benefit in making the changes.
  • Secondly, though I check for multiple different types, I assume that the data fits one of the below categories
    1. null – we have to check for null first, as if the cell is null and we try to determine a type, then the script will error
    2. String
    3. Numeric
    4. Boolean
    5. Blank
    6. Formula – note that the formula won’t be calculated, the absolute text of the formula will be returned

    If the cell doesn’t match one of the above types, then the console will output what type it is. I would then look to enhance my script to be able to properly handle that type.

  • Finally, I assume that a String representation of what I read in from the file, will be usable by the script. Although I will read in different types of data, it will all be converted into a string format before being passed back to the function that called it.

This is for simplicity, as it allows me to then send back a single 2-dimensional String array, rather than sending back a more complicated jumble of object types. Numbers and Booleans can be converted back to their appropriate types if necessary, back in the function that calls this one.

Once we have that array, we can then read individual elements. Remember that arrays are zero-based, and so to read the value of cell B5 as an example (the 2nd across, and the 5th down), we would look at cellString[1][4], and not cellString [2][5].
So, what does this look like?
I’ll start with the main calling script:

package com.my.classes;
import java.io.IOException;
import com.my.sharedClasses.cAccessExcel;
public class cBlog {
public static void main(String[] args) throws InterruptedException, IOException
{
//Initialise the excel script
cAccessExcel excel = new cAccessExcel ();
String xls = "CheckExcel";
//So here we’re looking for file C:/SeleniumData/CheckExcelDataIn.xlsx

boolean print = true;
//The print Boolean states whether the results are printed out to the console or not – useful to confirm you’re getting the results you expect

//Run the excel script
String[][] excelOutput = excel.mAccessExcel(xls, print);

//Do stuff with the 2-dimensional String array excelOutput[][]
}
}

And then the code that does the magic:

package com.my.sharedClasses;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
mport org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.SheetUtil;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class cAccessExcel {
public String[][] mAccessExcel(String xls, Boolean print) throws InterruptedException, IOException
{
String excelFilePath = "C:\\SeleniumData\\"+xls+"DataIn.xlsx";
//Here I'm specifying xlsx, though the private method at the bottom of the class would
//also allow us to cater for .xls files
System.out.println("Reading from "+excelFilePath);
FileInputStream inputStream = new FileInputStream(new File(excelFilePath));

Workbook workbook = getRelevantWorkbook(inputStream, excelFilePath);
//This line enables us to handle either .xls or .xlsx files, using a separate private method

Sheet firstSheet = workbook.getSheetAt(0);
Iterator iterator = firstSheet.iterator();
int excelRow = 0;
int excelColumn = 0;

//Count rows and columns (columns for the first row only) to determine size of 2-dimensional array
while (iterator.hasNext())
{
Row nextRow = iterator.next();
Iterator cellIterator = nextRow.cellIterator();
if(excelRow==0)
{
excelColumn = 0;
while (cellIterator.hasNext())
{
cellIterator.next();
excelColumn = excelColumn +1;
}
}
excelRow = excelRow +1;
}

//Define a 2-dimensional cell array to hold the data
Cell[][] cell = new Cell[excelRow][excelColumn];

//Define a 2-dimensional String array, to carry the data back to the main function
String cellString[][]= new String[excelRow][excelColumn];

if (print==true)
{
System.out.println(excelRow + " rows");
System.out.println(excelColumn + " columns");
}

//Start Excel Analysis
for (int i=0;i<excelRow;i++)
{
for (int j=0;j<excelColumn;j++)
{
cell[i][j] = SheetUtil.getCell(firstSheet, i, j);

if (cell[i][j] == null) //check for null cell type
{
cellString[i][j] = "";
if (print==true)
{
System.out.println("Cell "+(i+1)+"/"+(j+1)+" is: "+ cellString[i][j] + " (was null)");
}
}

else if (cell[i][j].getCellTypeEnum() == CellType.STRING) //check for String cell type
{
cellString[i][j] = cell[i][j].getStringCellValue();
if (print==true)
{
System.out.println("Cell "+(i+1)+"/"+(j+1)+" is: "+ cellString[i][j]);
}
}

else if (cell[i][j].getCellTypeEnum() == CellType.NUMERIC) //check for Numeric cell type
{
int cellHolding = (int) cell[i][j].getNumericCellValue();
cellString[i][j] = Integer.toString(cellHolding);
if (print==true)
{
System.out.println("Cell "+(i+1)+"/"+(j+1)+" is: "+ cellString[i][j]);
}
}

else if (cell[i][j].getCellTypeEnum() == CellType.BOOLEAN) //check for Boolean cell type
{
boolean cellBool = cell[i][j].getBooleanCellValue();
cellString[i][j] = String.valueOf(cellBool);
if (print==true)
{
System.out.println("Cell "+(i+1)+"/"+(j+1)+" is: "+ cellString[i][j]);
}
}

else if (cell[i][j].getCellTypeEnum()==CellType.BLANK) //check for Blank cell type
{
cellString[i][j] = "";
if (print==true)
{
System.out.println("Cell "+(i+1)+"/"+(j+1)+" is: "+ cellString[i][j]);
}
}

else if (cell[i][j].getCellTypeEnum()==CellType.FORMULA) //check for Formula cell type
{
cellString[i][j] = String.valueOf(cell[i][j].getCellFormula());
if (print==true)
{
System.out.println("Cell "+(i+1)+"/"+(j+1)+" is: "+ cellString[i][j]);
}
}

else        //error handler for any other cell type
{
System.out.println("Cell not processed. Type is: " + cell[i][j].getCellTypeEnum());
}
}
}
if (print==true)
{
System.out.println(“Excel processing complete");
}
return cellString;
}

private static Workbook getRelevantWorkbook(FileInputStream inputStream, String excelFilePath) throws IOException
{
Workbook workbook = null;
if (excelFilePath.endsWith("xls"))
{
workbook = new HSSFWorkbook(inputStream);
}
else if (excelFilePath.endsWith("xlsx"))
{
workbook = new XSSFWorkbook(inputStream);
}
else
{
throw new IllegalArgumentException("Incorrect file format");
}
return workbook;
}
}

As ever, please post any corrections or suggested improvements in the comments, or message me on twitter @Radionotme.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s