easyexcel icon indicating copy to clipboard operation
easyexcel copied to clipboard

读取大文件耗时很久

Open cengxw opened this issue 1 year ago • 4 comments

建议先去看文档

快速开始常见问题

异常代码

EasyExcel.read(file, new ReadListener() {
  @Override
  public void invoke(Object data, AnalysisContext context) {
	  System.out.println(data);
	  throw new ExcelAnalysisStopException();
  }
  
  @Override
  public void doAfterAllAnalysed(AnalysisContext context) {
  
  }
}).sheet().doRead();

异常提示

问题描述

就这段代码,读取一个160MB的xlsx,100W行、48列,耗时40秒。我只想读取第一行然后就结束读取,这耗时让我怀疑我是不是用错方式了

cengxw avatar Oct 31 '24 09:10 cengxw

我用以下方法来读取就很快,2~4秒就能出结果,能说说差异在哪吗?

public static Map<String, String> firstMap(File file) {
        OPCPackage pkg = null;
        InputStream sheet = null;
        final Map<String, String> titleMap = new HashMap<>();
        try {
            pkg = OPCPackage.open(file, PackageAccess.READ);
            XSSFReader reader = new XSSFReader(pkg);
            sheet = reader.getSheet("rId1");
            XMLReader parser = XMLReaderFactory.createXMLReader();
            ReadOnlySharedStringsTable rosst = new ReadOnlySharedStringsTable(pkg);
            parser.setContentHandler(new XSSFSheetXMLHandler(reader.getStylesTable(),rosst,new XSSFSheetXMLHandler.SheetContentsHandler() {

                @Override
                public void cell(String index, String value, XSSFComment comment) {
                    titleMap.put(index, value);
                }

                @Override
                public void startRow(int arg0) {
                }

                @Override
                public void endRow(int arg0) {
                    throw new ArithmeticException();
                }

            },
            false));
            try {
                parser.parse(new InputSource(sheet));
            } catch(ArithmeticException e) {
                
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                sheet.close();
            } catch (IOException e) {

            }
            try {
                pkg.close();
            } catch (IOException e) {

            }
            return titleMap;
        }
    }

cengxw avatar Oct 31 '24 16:10 cengxw

因为他源码里在读取的时候在一开始就解析了完整的sharestring,你的代码是自己处理了sharestring

wyctxwd1 avatar Nov 04 '24 08:11 wyctxwd1

可以尝试换成EEC来处理这类问题, https://github.com/wangguanquan/eec 它支持filter, map, collect等标准的流式处理,可以像操作集合类一样读取Excel

try (ExcelReader reader = ExcelReader.read(Paths.get("/export/excel-1000k.xlsx"))) {
    // 指定第1行为表头并获取表头
    HeaderRow row = (HeaderRow) reader.sheet(0).header(1).getHeader();
    System.out.println(row);

    // 获取第100行的数据
    Row row100 = reader.sheet(0).rows().filter(r -> r.getRowNum() == 100).findAny().get();
    System.out.println(row100);

    // 将行数据转换为指定对象
    Item item = row100.to(Item.class);
}

wangguanquan avatar Nov 06 '24 12:11 wangguanquan

感谢各位,我去研究研究

cengxw avatar Nov 06 '24 13:11 cengxw