spout
spout copied to clipboard
Implement \Countable interface
It would be great to implement a \Countable interface. Simply loop over rows/lines without extracting data.
What do you think?
The main problem implementing Countable is that you have to go through the entire XML file storing the sheet's structure to get the count. And if you want the values, you have to go through it one more time. For large spreadsheets, this is really time/resources consuming and I don't think the benefits are big enough to add this. But I may be wrong...
What's your use case?
I have an import on CLI and I want to show a progress bar / number of entries (Symfony console is used). I know that this is time consuming, but without converting it's faster as my current workaround (iterate over all entries and increase a count variable, after that, the real import is started).
Have you another idea how to show a progress bar?
I did a quick performance test to see how long it would take to get a count. For an ODS file containing 1 million rows (and 3 million cells), it takes between 12 seconds (using DOMDocument) and 23 seconds (using XMLReader). I expect similar results for XLSX files.
The results actually better than I expected :) So I may consider adding it to the library.
For reference, here is my code:
function usingDOMDocument($filePath) {
$dom = new DOMDocument();
$dom->load($filePath);
$nodes = $dom->getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:table:1.0', 'table-row');
return $nodes->length;
}
function usingXMLReader($filePath) {
$xmlReader = new XMLReader();
$xmlReader->open($filePath, null, LIBXML_NONET);
$count = 0;
while ($xmlReader->read()) {
if ($xmlReader->name === 'table:table-row' && $xmlReader->nodeType === XMLReader::ELEMENT) {
$count++;
}
}
return $count;
}
BTW there is no other way to show a progress bar, you need to get the count first :)
@adrilo That sounds awesome. :+1: I have only 5000 rows at the moment.
If this is implemented in 3.0 would it also be possible to get the max columns for a given sheet?
I need the exactly same feature for XLSX (also for the same reason; creating reports and showing the progress in the CLI).
I just tried the XLSX approach and it worked like a charm: This is what I essentially added in the XLSX Row Iterator class:
namespace Box\Spout\Reader\XLSX;
class RowIterator implements IteratorInterface, \Countable {
//...
protected $numTotalRows = 0;
//...
public function rewind() {
//...
$filePath = $this->xmlReader->getRealPathURIForFileInZip($this->filePath, $this >sheetDataXMLFilePath);
$dom = new \DOMDocument();
$dom->load($filePath);
$nodes = $dom->getElementsByTagName('row');
$this->numTotalRows = $nodes->length;
//...
}
//...
public function count() {
return $this->numTotalRows;
}
//...
}
Can we add that to the code base?
Having such option is a must.
The implementation itself is not difficult. What's more difficult is telling people that getting the row count can be slow. It creates confusion as people usually thinks that's it's a simple value to read. With big spreadsheets, going through all "rows" may take a while... @andreas-gruenwald Have you tried running your code on a spreadsheet with 1 million rows? I'm curious about the perf
@adrilo I tested with a spreadsheet consisting of 7 sheets, consisting of 100.000s of lines (some of them have between 200.000 and 500.000 lines).
Without my add-on the initial setup of the iterator took more than 50 minutes, then i stopped the script. With the add-on it takes just a couple of seconds and is super fast.
Any updates on this one?
Any updates or ETA for this?
I needed a way of getting total rows of file and I've added a function in my Symfony service that works.
filename is a property in the service, you can pass it to the function instead.
It can be improved with constants, but this works for me.
public function getTotalRows($reader)
{
$fileSystem = new Filesystem();
$finder = new Finder();
$zip = new \ZipArchive();
$dom = new \DOMDocument();
$baseName = pathinfo($this->fileName)['basename'];
$path = sys_get_temp_dir() . '/' . $baseName;
$nodesCount = 0;
if ($reader instanceof XLSXReader) {
$xmlPath = $path . '/xl/worksheets';
if ($zip->open($this->fileName) === TRUE) {
$zip->extractTo($path);
$finder->files()->in($xmlPath);
foreach ($finder as $xmlFile) {
$dom->load($xmlFile->getRealPath());
$nodes = $dom->getElementsByTagName('row');
$nodesCount += $nodes->length - 1; // to skip header
}
$fileSystem->remove($path);
}
return $nodesCount;
}
if ($reader instanceof ODSReader) {
$xmlPath = $path . '/content.xml';
if ($zip->open($this->fileName) === TRUE) {
$zip->extractTo($path);
$dom->load($xmlPath);
$nodes = $dom->getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:table:1.0', 'table-row');
$nodesCount += $nodes->length - 1; // to skip header
$fileSystem->remove($path);
}
return $nodesCount;
}
/**
* Ref: https://stackoverflow.com/a/43075929
*/
if ($reader instanceof CSVReader) {
$file = new \SplFileObject($this->fileName, 'r');
$file->setFlags(\SplFileObject::READ_AHEAD | \SplFileObject::SKIP_EMPTY | \SplFileObject::DROP_NEW_LINE);
$file->seek(PHP_INT_MAX);
return $file->key() + 1;
}
return null;
}
Note that @Tersoal's solution for CSV does not take into account values that contain line breaks. For a proper count, there's no other way than parsing every row.
I see. The method must be improved. Also it's coupled to symfony. Sorry for the inconvenience.
@Tersoal No worries, the comment was more a note for myself that it was not fully compliant. The other implementations look good though :)