Archive7z icon indicating copy to clipboard operation
Archive7z copied to clipboard

Question marks in Cyrillic file names

Open KonstantinKorepin opened this issue 2 years ago • 2 comments

Hi all!

I have a problem with the Archive7z class. I made a function

public function unzip(string $pathToSource): DirectoryIterator
   {
       ...

      $obj = new Archive7z($pathToSource);
      $obj->setOutputDirectory($pathToDestination);
      $obj->extract();

      return new DirectoryIterator($pathToDestination);
   }

which returns a DirectoryIterator with the directory to unpack files.

Next, I collect information about the unpacked files:

 $iterator = $this->zipper->unzip($zipFile->getRealPath());
 foreach ($iterator as $unzippedFile) {
     if (!in_array($unzippedFile->getFilename(), ['.', '..'])) {
         $encoding = mb_detect_encoding($unzippedFile->getFilename()); // ASCII
         $fileName = $unzippedFile->getFilename(); // 12_1_?????.txt
     }
  }

And on my server, the encoding is defined as ASCII, and in the file names instead of Cyrillic letters question marks.

In the local environment(Docker), everything is displayed normally. mb_detect_encoding($unzippedFile->getFilename()) returns UTF-8 and the file names are correct.

I also tried to reproduce this error in docker and I managed to do it using the link https://zalinux.ru/?p=5740. That is, I commented out the en_US.UTF-8 UTF-8 encoding in the PHP container in the /etc/locale.gen file and ran the command locale-gen. After that, I only had ru_RU.UTF-8 UTF-8 encoding left. And after that the encoding of the unpacked files also began to be defined as ASCII, not UTF-8, and question marks began to appear instead of Cyrillic characters ?

If we return the en_US.UTF-8 UTF-8 encoding in the container and execute locale-gen, then again everything works fine. Tell, please, what can I do so that when unpacking files, Cyrillic characters are displayed in the file names, not signs questions. How to make files unpacked in UTF-8 encoding and not ASCII?

KonstantinKorepin avatar Aug 05 '22 11:08 KonstantinKorepin

yes. we have some problems with cyrillic charactes. see https://github.com/Gemorroj/Archive7z/blob/5.4.0/tests/Archive7zTest.php#L841 for example. but now i don't have any ideas how to fix this. see https://github.com/Gemorroj/Archive7z/issues/15

Gemorroj avatar Aug 05 '22 12:08 Gemorroj

Solution:

class Archive7zRu extends Archive7z
{
    /**
     * Exit codes
     * 0 - Normal (no errors or warnings detected)
     * 1 - Warning (Non fatal error(s)). For example, some files cannot be read during compressing. So they were not compressed
     * 2 - Fatal error
     * 7 - Bad command line parameters
     * 8 - Not enough memory for operation
     * 255 - User stopped the process with control-C (or similar).
     *
     * @throws \Symfony\Component\Process\Exception\ProcessFailedException
     */
    protected function execute(Process $process): Process
    {
        $locale='ru_RU.UTF-8';
        $env = $process->getEnv();
        $env['LC_ALL'] = $locale;
        $process->setEnv($env);

        return $process->mustRun();
    }
}

KonstantinKorepin avatar Sep 06 '22 13:09 KonstantinKorepin