scripts icon indicating copy to clipboard operation
scripts copied to clipboard

ansi2html.sh dies on specific input.

Open John-Schlick opened this issue 10 years ago • 7 comments

Given the git diff file below, ansi2html (the newest version pulled straight from the repo) will generate output that stops midway thru the file. Given that the "á" shows up in the output as a "�" I suspect that this is somehow matters.

The HTML stops on this line: informe. Una agencia investigadora de informes de crédito deber�°

========= Input file ======================= [1mdiff --git a/classes/screening/report/ui.class.php b/classes/screening/report/ui.class.php[m [1mindex 3476082..7fbb43d 100755[m [1m--- a/classes/screening/report/ui.class.php[m [1m+++ b/classes/screening/report/ui.class.php[m [36m@@ -319,6 +319,14 @@[m [mclass screening_Report_UI[m }[m break;[m [m [32m+[m[32m // Decide what version of the California disclaimer to use on pur printer friendly forms.[m [32m+[m[32m case REPORT_COMPONENT_PRINTER_FRIENDLY_HEADER:[m [32m+[m[32m $eligReleaseV2 = Params::get('screening', 'report_component_printer_friendly_header.release.v2');[m [32m+[m[32m if ($eligReleaseV2 <= $createTime) {[m [32m+[m[32m $renderVersion = 2;[m [32m+[m[32m }[m [32m+[m[32m break;[m [32m+[m case REPORT_COMPONENT_PHYSICAL_CRIMINAL_SEX_OFFENDER:[m case REPORT_COMPONENT_TALENTSHIELD_PHYSICAL_CRIMINAL_SEX_OFFENDER:[m case REPORT_COMPONENT_PHYSICAL_CRIMINAL_NATIONWIDE_DB:[m [1mdiff --git a/core/inc/defines.inc.php b/core/inc/defines.inc.php[m [1mindex a703c53..5082859 100755[m [1m--- a/core/inc/defines.inc.php[m [1m+++ b/core/inc/defines.inc.php[m [36m@@ -195,8 +195,9 @@[m [mdefine ("REPORTYPE_CONSUMER_DMV", "57"); //Consumer Driving Report[m // NOTE::::::::::::::: PLEASE DO NOT ADD ANY REPORT TYPE WITHOUT TALKING TO NIRAJ[m [m /*******************************************************************_**/[m [31m-//Defines the possible ReportComponents[m [31m-//DO NOT CHANGE THE DEFINE NUMBERS ONCE THEY HAVE BEEN USED IN PRODUCTION[m [32m+[m[32m// Defines the possible ReportComponents[m [32m+[m[32m// DO NOT CHANGE THE DEFINE NUMBERS ONCE THEY HAVE BEEN USED IN PRODUCTION[m [32m+[m[32m// Kept in Intelius.Packages.ReportComponents (| separated list, search for LIKE %value%)[m define ("REPORT_COMPONENT_NONE", "0");[m define ("REPORT_COMPONENT_SUMMARY", "1");[m define ("REPORT_COMPONENT_PROPERTY", "7");[m [36m@@ -287,6 +288,7 @@[m [mdefine ("REPORT_COMPONENT_PHYSICAL_CRIMINAL_1_STATE", "96"); // Like 110 (natcri[m define ("REPORT_COMPONENT_ISERVICES_SEX_OFFENDER", "97");[m define ("REPORT_COMPONENT_PHYSICAL_CIVILCOUNTY", "98");[m define ("REPORT_COMPONENT_SOCIAL_NET_SUMMARY", "99");[m [32m+[m[32mdefine ("REPORT_COMPONENT_PRINTER_FRIENDLY_HEADER", "100");[m define ("REPORT_COMPONENT_PHYSICAL_PHYSICAL_EXAM", "101"); // Physical Physical -- I meant to do that[m define ("REPORT_COMPONENT_PHYSICAL_ESCREEN_DRUG_SCREEN", "102");[m define ("REPORT_COMPONENT_INCART_ACCEPTANCE_MARKETING", "103");[m [1mdiff --git a/core/inc/uberreport.inc.php b/core/inc/uberreport.inc.php[m [1mold mode 100644[m [1mnew mode 100755[m [1mindex 36f768f..202d377[m [1m--- a/core/inc/uberreport.inc.php[m [1m+++ b/core/inc/uberreport.inc.php[m [36m@@ -173,6 +173,9 @@[m [mclass UberReport[m public function DisplayUberReport($Owner, $GetUserObject, $applicantId, $isPrinterFriendlyPage = 0, $echoOut = TRUE)[m {[m global $SiteConfigCore;[m [32m+[m [32m+[m[32m // guarantee the define of the theme class.[m [32m+[m[32m require_once('inc/template.inc.php');[m theme::factory()->addDir('screening/tpl'); // In case we don't have it yet[m [m // ReportContext supersedes isPrinterFriendlyPage[m [36m@@ -971,18 +974,57 @@[m [mclass UberReport[m $includeFCRA = false;[m }[m [m [31m- $civilCode = '';[m [31m- if ($includeFCRA){[m [31m- // for internation uberform, we do not show Civil code nor FCRA[m [31m- $civilCode = 'Per California Civil Code 1786, ';[m [32m+[m[32m // figure out what render version to use.[m [32m+[m[32m $FakeReqProfile = array([m [32m+[m[32m 'App' => REPORT_COMPONENT_PRINTER_FRIENDLY_HEADER,[m [32m+[m[32m 'CreateTime' => $this->m_CreateTime,[m [32m+[m[32m );[m [32m+[m[32m $FakeReportData = array();[m [32m+[m[32m $renderVersion = screening_Report_UI::getRenderVersion($FakeReqProfile, $FakeReportData, $this->m_ReportContext);[m [32m+[m [32m+[m[32m if ($renderVersion === 1) {[m [32m+[m[32m $civilCode = '';[m [32m+[m[32m if ($includeFCRA){[m [32m+[m[32m // for internation uberform, we do not show Civil code nor FCRA[m [32m+[m[32m $civilCode = 'Per California Civil Code 1786, ';[m [32m+[m[32m }[m [32m+[m [32m+[m[32m // 12pt font is a legal requirement that presumably applies to Web as well as print[m [32m+[m[32m $legalTopHtml = '

'.$civilCode.$this->m_SiteName.' does not[m [32m+[m[32m guarantee the accuracy or truthfulness of the information in this report as to the[m [32m+[m[32m person who is the subject of the investigation, only that the information is accurately copied from[m [32m+[m[32m public records. Information generated as a result of identity theft, including evidence of[m [32m+[m[32m criminal activity, may be inaccurately associated with the person who is the subject of the report.
'."\r\n";[m [32m+[m[32m } else {[m [32m+[m[32m // New and Improved, with a fresh spring scent![m [32m+[m[32m // 12pt font is a legal requirement that presumably applies to Web as well as print[m [32m+[m[32m $legalTopHtml = '
[m [32m+[m[32m California Applicants/Employees Only: The report does not guarantee the[m [32m+[m[32m accuracy or truthfulness of the information as to the subject of the[m [32m+[m[32m investigation, but only that it is accurately copied from public records,[m [32m+[m[32m and information generated as a result of identity theft, including[m [32m+[m[32m evidence of criminal activity, may be inaccurately associated with the[m [32m+[m[32m consumer who is the subject of the report. An investigative consumer[m [32m+[m[32m reporting agency shall provide a consumer seeking to obtain a copy of a[m [32m+[m[32m report or making a request to review a file, a written notice in simple,[m [32m+[m[32m plain English and Spanish setting forth the terms and conditions of his[m [32m+[m[32m or her right to receive all disclosures, as provided in Section[m [32m+[m[32m 1786.26.
[m [32m+[m[32m
[m [32m+[m[32m Sólo para los Solicitantes/Empleados de California: En el informe no se[m [32m+[m[32m garantiza la exactitud o veracidad de la información en cuanto al tema[m [32m+[m[32m de la investigación, sino sólo que se ha copiado exactamente de los[m [32m+[m[32m registros públicos, y la información generada como resultado del robo[m [32m+[m[32m de identidad, incluyendo las pruebas de una actividad delictiva, podría[m [32m+[m[32m estar incorrectamente asociada con el consumidor que sea el sujeto del[m [32m+[m[32m informe. Una agencia investigadora de informes de crédito deberá[m [32m+[m[32m suministrarle a un consumidor que trate de obtener una copia de un[m [32m+[m[32m informe o solicite revisar un archivo una notificación por escrito en[m [32m+[m[32m inglés y español lisos y llanos, en la que se establezcan los términos[m [32m+[m[32m y las condiciones de su derecho a recibir toda la información, como se[m [32m+[m[32m dispone en la Sección 1786.26.[m [32m+[m[32m
'."\r\n";[m }[m [31m-[m [31m- // 12pt font is a legal requirement that presumably applies to Web as well as print[m [31m- $legalTopHtml = '
'.$civilCode.$this->m_SiteName.' does not[m [31m- guarantee the accuracy or truthfulness of the information in this report as to the[m [31m- person who is the subject of the investigation, only that the information is accurately copied from[m [31m- public records. Information generated as a result of identity theft, including evidence of[m [31m- criminal activity, may be inaccurately associated with the person who is the subject of the report.
'."\r\n";[m }[m [m $legalBottomHtml = '';[m [1mdiff --git a/tests/unit/DataProvider/UserProvider.php b/tests/unit/DataProvider/UserProvider.php[m [1mnew file mode 100755[m [1mindex 0000000..80cf4d2[m [1m--- /dev/null[m [1m+++ b/tests/unit/DataProvider/UserProvider.php[m [36m@@ -0,0 +1,34 @@[m [32m+[m[32m[m \ No newline at end of file[m [1mdiff --git a/tests/unit/core/inc/UberReportTest.php b/tests/unit/core/inc/UberReportTest.php[m [1mnew file mode 100755[m [1mindex 0000000..44da4b2[m [1m--- /dev/null[m [1m+++ b/tests/unit/core/inc/UberReportTest.php[m [36m@@ -0,0 +1,49 @@[m [32m+[m[32muberReport = $uberReport;[m [32m+[m[32m }[m [32m+[m [32m+[m [32m+[m[32m // This is the weakest unit test ever.[m [32m+[m[32m // We call it, and make sure the report has the header we coded for.[m [32m+[m[32m public function testGetUberFormUser()[m [32m+[m[32m {[m [32m+[m[32m $userId = 16520012;[m [32m+[m[32m $userProvider = new UserProvider();[m [32m+[m[32m $Owner = $userProvider->getUser($userId);[m [32m+[m[32m $GetUserObject = array($userId);[m [32m+[m[32m $applicantId = 60924957;[m [32m+[m[32m // This is critical to our test. It MUSt be a printer friendly page to have the header.[m [32m+[m[32m $isPrinterFriendlyPage = 1;[m [32m+[m[32m $echoOut = false;[m [32m+[m [32m+[m[32m $this->uberReport->DisplayUberReport($Owner, $GetUserObject, $applicantId, $isPrinterFriendlyPage, $echoOut);[m [32m+[m [32m+[m[32m // Lets see what damage it's wrought.[m [32m+[m[32m $className = "UberReport";[m [32m+[m[32m $propertyName = "m_ReportHtml";[m [32m+[m[32m $object = $this->uberReport;[m [32m+[m[32m $m_ReportHtml = $this->getPrivateProperty($className, $propertyName, $object);[m [32m+[m [32m+[m[32m // ALL we changed is that printed reports, done AFTER the params date should have the header section.[m [32m+[m[32m $this->assertTrue(strstr($m_ReportHtml, '1786.26') !== true, "Report does NOT have the correct 1786.26 header.");[m [32m+[m[32m }[m [32m+[m[32m}[m [32m+[m[32m?>[m \ No newline at end of file[m

John-Schlick avatar Oct 08 '14 22:10 John-Schlick

Could you attach the diff in an email to [email protected].

Does the original version of the script before the recent awk change work any better? https://raw.githubusercontent.com/pixelb/scripts/bd2aabd/scripts/ansi2html.sh

pixelb avatar Oct 08 '14 22:10 pixelb

Good question. The reason I came here to get the newest version is that I was using the older version, and thought I'd try the new one. So...

Nope, it also dies with this character.

I'll attach the diff --color (which is a .txt file) to an email to you, as well as the output html that is cut short.

John-Schlick avatar Oct 08 '14 23:10 John-Schlick

Seems locale related. I can get new or old script to misbehave when I give it your UTF8 input, but with the locale variables not set. Though I can't get output truncated like you do. I presume there is an error message on stderr when this truncation happens? Are you running the script with a weird environment? If not what is the output from the command: locale

pixelb avatar Oct 09 '14 08:10 pixelb

I'm just running it on qa linux box, I don't >>think<< that anything is weird about it.

jschlick@dvm-jschlick2:/usr/local/html(BGS-1516)$ locale LANG=en_US LANGUAGE= LC_CTYPE="en_US" LC_NUMERIC="en_US" LC_TIME="en_US" LC_COLLATE="en_US" LC_MONETARY="en_US" LC_MESSAGES="en_US" LC_PAPER="en_US" LC_NAME="en_US" LC_ADDRESS="en_US" LC_TELEPHONE="en_US" LC_MEASUREMENT="en_US" LC_IDENTIFICATION="en_US" LC_ALL=

(I've never used this command before, so I just blindly typed it, and this is the output.)

John-Schlick avatar Oct 09 '14 16:10 John-Schlick

You're in an is0-8859-1 locale. It's unusual to not be in a UTF8 locale these days. A probable workaround would be to set a UTF8 locale first like:

git diff | (export LC_ALL=en_US.utf8; ansi2html.sh) > blah.html

I'll work on making it more locale agnostic

pixelb avatar Oct 09 '14 18:10 pixelb

Your workaround works for this case. thanks for figuring it out, I'd have probably never gotten there (since I don't actually know what that export does...)

If you do make this more locale agnostic, please let me know, and I'll happily take the new version and apply it to the entire company here.

John-Schlick avatar Oct 09 '14 18:10 John-Schlick

The latest version is now more locale agnostic. Could you try it out? I've not marked this bug as fixed though as I wasn't able to recreate your output truncation issue

pixelb avatar Jan 26 '15 17:01 pixelb