TrendRadar icon indicating copy to clipboard operation
TrendRadar copied to clipboard

feat(core): 添加自定义词典支持和TF-IDF匹配功能

Open TinTongogo opened this issue 3 weeks ago • 2 comments

  • 集成jieba分词库和sklearn TF-IDF算法
  • 实现自定义词典文件config/custom_dict.txt的加载
  • 将原有的关键词匹配算法替换为TF-IDF相似度匹配
  • 在Dockerfile和配置文件中添加CUSTOM_DICT_FILE环境变量
  • 更新Python版本要求从3.10到3.12
  • 添加截图模式优化HTML报告生成
  • 修复分词和匹配逻辑提升准确率

TinTongogo avatar Dec 28 '25 14:12 TinTongogo

Changes Summary

This PR integrates jieba Chinese word segmentation and scikit-learn TF-IDF algorithm to replace the previous keyword matching logic with similarity-based matching. It adds support for custom dictionary loading via CUSTOM_DICT_FILE environment variable and includes HTML report generation enhancements with screenshot mode, while upgrading Python requirement from 3.10 to 3.12.

Type: feature

Components Affected: core/frequency.py - keyword matching algorithm, config/custom_dict.txt - new custom dictionary file, requirements.txt - new dependencies (jieba, scikit-learn), pyproject.toml - Python version and dependencies, report/html.py - HTML rendering with screenshot mode, docker/Dockerfile - Python base image and env vars, context.py - integration of new word matching logic

Files Changed
File Summary Change Impact
/tmp/workspace/trendradar/core/frequency.py Replaced string-based keyword matching with TF-IDF cosine similarity algorithm; added jieba Chinese word segmentation; added custom dictionary loading support ✏️ 🔴
/tmp/workspace/requirements.txt Added scikit-learn and jieba dependencies for ML-based matching and Chinese NLP ✏️ 🔴
/tmp/workspace/pyproject.toml Upgraded Python requirement from >=3.10 to >=3.12 and added scikit-learn/jieba dependencies ✏️ 🔴
/tmp/workspace/config/custom_dict.txt New custom dictionary file for jieba word segmentation with example entries 🟢
/tmp/workspace/trendradar/report/html.py Added screenshot mode and enhanced HTML rendering with html2canvas for image generation; added save buttons and content reordering support ✏️ 🟡
/tmp/workspace/docker/Dockerfile Updated Python base image to 3.12-slim and added CUSTOM_DICT_FILE environment variable ✏️ 🟡
/tmp/workspace/trendradar/context.py Updated imports and references to match new frequency word matching function signature ✏️ 🟡
/tmp/workspace/.github/workflows/crawler.yml Minor workflow configuration update ✏️ 🟢
/tmp/workspace/index.html Enhanced HTML index with screenshot/export functionality ✏️ 🟢
/tmp/workspace/docker/Dockerfile.mcp Updated MCP Dockerfile with new environment variables ✏️ 🟢
Architecture Impact
  • New Patterns: ML-based similarity matching (TF-IDF with cosine similarity), Chinese language processing pipeline (jieba segmentation), Custom dictionary injection pattern, Screenshot mode pattern in HTML rendering
  • Dependencies: added: scikit-learn>=1.7.2,<2.0.0, added: jieba>=0.42.1,<1.0.0
  • Coupling: Frequency matching logic now depends on ML libraries and jieba for Chinese text processing; increased coupling with scikit-learn for TF-IDF computation. Custom dictionary loading creates new dependency on file system for optional dictionary file.
  • Breaking Changes: Python version requirement bumped from 3.10+ to 3.12+ (breaking change for older environments), load_frequency_words() function signature changed - now accepts custom_dict_file parameter

Risk Areas: TF-IDF algorithm replaces deterministic string matching - matching behavior may differ significantly from previous implementation; threshold value (0.105) needs validation against real news data, Exception handling in tfidf_match() uses bare except clause (line 259) which catches all exceptions including SystemExit and KeyboardInterrupt, Logic issue in matches_word_groups() at line 214: 'return True' statement executes after first group check, making all subsequent word groups unreachable, Custom dictionary file is optional - behavior differs if file exists vs doesn't exist, but no explicit logging in non-existent case might cause confusion, ML library integration increases memory footprint and dependency count; TfidfVectorizer instantiated per match call (performance concern), Stop words list in tfidf_match() is hardcoded and English-Chinese mixed, not environment-configurable

Suggestions
  • Add unit tests validating TF-IDF threshold behavior with representative news titles
  • Fix bare except clause at line 259 - use 'except Exception:' instead to avoid catching system exceptions
  • Fix the logic error at line 214 where 'return True' breaks the loop early - should only return for matched groups
  • Consider making TF-IDF threshold and stop words configurable via environment variables or config file
  • Cache TfidfVectorizer instance or pre-compute vectors to improve performance (vectorizer currently created per match)
  • Add integration tests comparing matching results before/after algorithm change to identify regressions
  • Document the behavior change from deterministic substring matching to probabilistic similarity matching
  • Consider adding logging for TF-IDF match results (title, score, threshold) for debugging

Full review in progress... | Powered by diffray

diffray-bot avatar Dec 29 '25 17:12 diffray-bot

Review Summary

Free public review - Want AI code reviews on your PRs? Check out diffray.ai

Validated 60 issues: 38 kept, 22 filtered

Issues Found: 38

💬 See 14 individual line comment(s) for details.

📊 23 unique issue type(s) across 38 location(s)

📋 Full issue list (click to expand)

🔴 CRITICAL - Unconditional return statement in loop negates logic

Agent: bug in control flow - not from rules but a critical logic error

Category: bug

File: trendradar/core/frequency.py:212-216

Description: Line 214 contains an unconditional 'return True' that executes after the first group's tfidf_match check. This causes the function to always return True on the first iteration of the loop regardless of whether tfidf_match actually succeeded.

Suggestion: Remove the unconditional 'return True' at line 214. The function should only return True if tfidf_match returns True (already handled at line 212-213), then continue looping.

Confidence: 100%

Rule: bug in control flow - not from rules but a critical logic error


🔴 CRITICAL - Hash verification uses SHA1 instead of SHA256

Agent: microservices

Category: security

File: docker/Dockerfile:44

Description: The supercronic binary is verified using SHA1 checksums. While SHA1 is acceptable for non-collision-resistant integrity checks, SHA256 is the modern standard and provides stronger guarantees.

Suggestion: Update to SHA256 verification: obtain SHA256 hash from supercronic releases and use sha256sum instead of sha1sum.

Confidence: 75%

Rule: docker_pin_exact_versions_with_digests_for_base


🔴 CRITICAL - CI/CD pipeline missing test step before production execution

Agent: testing

Category: testing

File: .github/workflows/crawler.yml:50-163

Description: The crawler workflow does not include any test step before running the production crawler. The pipeline proceeds directly from installing dependencies to executing the crawler without validating code quality or correctness.

Suggestion: Add a test step after installing dependencies and before running the crawler. Include pytest or equivalent testing framework to run unit tests with proper assertions and error handling.

Confidence: 90%

Rule: cicd_missing_test_step


🟠 HIGH - Bare except block catches all exceptions (3 occurrences)

Agent: python

Category: bug

📍 View all locations
File Description Suggestion Confidence
trendradar/core/frequency.py:259-261 Bare except: clause catches SystemExit, KeyboardInterrupt, and other BaseExceptions, making it imp... Replace with except Exception as e: to catch only standard exceptions, or use specific types like ... 95%
docker/manage.py:51-54 Bare except: at line 53 catches all exceptions including SystemExit and KeyboardInterrupt when par... Replace with except (ValueError, IndexError) as e: to catch only expected parsing errors 92%
trendradar/__main__.py:53-54 Bare except: in parse_version function inside check_version catches all exceptions including Syste... Replace with except (ValueError, IndexError) as e: to catch only expected parsing errors 90%

Rule: py_avoid_generic_except_blocks


🟠 HIGH - TfidfVectorizer instantiation in loop creates O(n*m) complexity (3 occurrences)

Agent: performance

Category: performance

📍 View all locations
File Description Suggestion Confidence
trendradar/core/frequency.py:194-248 tfidf_match() instantiates a new TfidfVectorizer with stop_words list preprocessing every call. Sinc... Move TfidfVectorizer instantiation outside the function or use a module-level cached instance. Consi... 90%
docker/manage.py:363-372 Line 364 calls stat() in the sort key, then lines 368-369 call stat() again on the same files for di... Cache stat results: files_with_stats = [(f, f.stat()) for f in files], then use cached stat for both... 60%
trendradar/report/html.py:619-639 Lines 621-622 call min(ranks) and max(ranks) separately, requiring two list traversals. While the im... For small optimization: use 'min_rank, max_rank = min(ranks), max(ranks)' or compute both in single ... 60%

Rule: perf_expensive_in_loop


🟠 HIGH - Missing HEALTHCHECK instruction for exposed service port

Agent: microservices

Category: microservices

File: docker/Dockerfile.mcp:22-23

Description: Dockerfile.mcp exposes port 3333 and runs an HTTP MCP server but lacks a HEALTHCHECK instruction. Container orchestrators cannot detect when the service becomes unresponsive.

Suggestion: Add a HEALTHCHECK instruction before the CMD. Example: HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 CMD python -c "import socket; s = socket.socket(); s.connect(('localhost', 3333)); s.close()" || exit 1

Confidence: 85%

Rule: docker_no_healthcheck


🟠 HIGH - Container image not pinned by digest (2 occurrences)

Agent: security

Category: security

📍 View all locations
File Description Suggestion Confidence
docker/Dockerfile:1 The base image 'python:3.12-slim' uses a mutable tag which can change unexpectedly between builds. T... Pin the image to a specific digest: FROM python:3.12-slim@sha256: 85%
docker/Dockerfile.mcp:1 The base image 'python:3.12-slim' uses a mutable tag which can change unexpectedly between builds, c... Pin the image to a specific digest: FROM python:3.12-slim@sha256: 85%

Rule: gen_pin_container_images_by_digest


🟠 HIGH - Bare except clause catching all exceptions (4 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
docker/manage.py:450-451 Bare except: clause silently swallows all exceptions without any logging or specific handling. Specify exception type: except Exception as e: and optionally log the error or re-raise if appropria... 85%
docker/manage.py:528-529 Bare except clause in stop_webserver function catches all exceptions without specific type. This mak... Replace with: except Exception as e: or catch specific exceptions like (OSError, IOError) relevant t... 85%
docker/manage.py:450-451 Bare except: pass in webserver PID file cleanup catches all exceptions silently. Use except OSError: pass since file removal only raises OSError 70%
docker/manage.py:528-529 Bare except: pass silently ignores errors when removing webserver PID file during stop operation. Use except OSError: pass for specific file removal error handling 70%

Rule: python_bare_except


🟠 HIGH - Print statements instead of logging module (2 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
trendradar/core/frequency.py:58-60 Lines 58 and 60 use print() for status messages in library code. Production code should use the logg... Import logging module and replace print() calls with logging.info() or logging.debug(). 75%
trendradar/__main__.py:47-54 Lines 53-54 in nested function 'parse_version' use bare 'except:' without specifying exception type.... Replace with specific exceptions: 'except (ValueError, IndexError, AttributeError):' or 'except Exce... 85%

Rule: python_print_debug


🟠 HIGH - Container runs as root user - No USER directive present

Agent: security

Category: security

File: docker/Dockerfile.mcp:1-23

Description: The Dockerfile does not contain a USER instruction, which means the container will run as root user (UID 0). This increases the attack surface.

Suggestion: Add a USER instruction at the end of the Dockerfile to run as a non-root user. Example: RUN adduser --disabled-password --gecos '' appuser && chown -R appuser:appuser /app, followed by USER appuser

Confidence: 90%

Rule: docker_run_as_root


🟠 HIGH - Environment variables read directly in function instead of being injected

Agent: architecture

Category: architecture

File: trendradar/core/frequency.py:50-64

Description: The load_frequency_words() function reads environment variables (CUSTOM_DICT_FILE and FREQUENCY_WORDS_PATH) directly using os.environ.get() as fallback when parameters are None. This makes testing harder as environment state must be mocked.

Suggestion: Move environment variable loading to application startup and pass the values to load_frequency_words() as parameters. Remove the os.environ fallback from within the function.

Confidence: 75%

Rule: py_move_environment_configuration_to_startu


🟠 HIGH - Large stop words list should be external file

Agent: python

Category: quality

File: trendradar/core/frequency.py:234-248

Description: Lines 236-248 contain a hardcoded list of 49+ Chinese stop words embedded in function code. This static data should be stored in an external configuration file for maintainability and reusability.

Suggestion: Move stop_words to external file (e.g., 'config/stopwords.txt') and load at module initialization. This allows updating stop words without code changes.

Confidence: 85%

Rule: py_move_large_templates_to_external_files


🟡 MEDIUM - Duplicate rank display logic violates DRY principle

Agent: refactoring

Category: quality

File: trendradar/report/html.py:619-713

Description: The code for handling rank display (min/max rank calculation and rank class determination) appears twice: for regular news items (lines 619-638) and for new news items (lines 696-713).

Suggestion: Extract the rank display logic into a shared helper function: 'def format_rank_info(ranks, rank_threshold)' that returns (rank_class, rank_text).

Confidence: 80%

Rule: quality_guard_clauses


🟡 MEDIUM - Large HTML template embedded in Python function

Agent: architecture

Category: architecture

File: trendradar/report/html.py:14-1094

Description: The render_html_content() function embeds ~1000 lines of HTML/CSS/JS template directly in the Python file. This makes maintenance and testing more difficult.

Suggestion: Extract the HTML template to a separate template file (using Jinja2 or similar) or create a dedicated template module.

Confidence: 70%

Rule: py_separate_business_logic_from_framework


🟡 MEDIUM - Dockerfile missing HEALTHCHECK instruction

Agent: bugs

Category: bug

File: docker/Dockerfile:1-72

Description: The Docker container lacks a HEALTHCHECK instruction. Without it, Docker orchestrators cannot monitor the health of the container automatically.

Suggestion: Add a HEALTHCHECK instruction to verify the service is running properly.

Confidence: 70%

Rule: docker_missing_healthcheck


🟡 MEDIUM - Duplicate Dockerfile patterns could be consolidated

Agent: microservices

Category: quality

File: docker/Dockerfile:1-72

Description: Dockerfile and Dockerfile.mcp share similar base images and configuration patterns. Both use python:3.12-slim and install from requirements.txt.

Suggestion: Consider consolidating into a single multi-stage Dockerfile with build targets to reduce maintenance overhead.

Confidence: 65%

Rule: docker_consolidate_duplicate_dockerfiles


🟡 MEDIUM - CMD lacks environment validation wrapper

Agent: microservices

Category: quality

File: docker/Dockerfile.mcp:23

Description: The CMD instruction launches the MCP server directly without validating required configuration files or environment variables exist, unlike the main Dockerfile which uses entrypoint.sh.

Suggestion: Add an entrypoint script similar to the main Dockerfile that validates config files exist before launching the server.

Confidence: 70%

Rule: docker_add_explicit_environment_variable_checks


🟡 MEDIUM - Silent exception swallowing in crontab reading (2 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
docker/manage.py:189-191 Bare except: pass silently swallows all exceptions when reading crontab file content. Errors are h... Use specific exception handling: except (FileNotFoundError, IOError) as e: pass or log the error 85%
trendradar/core/frequency.py:122-123 Exception swallowed when parsing max count directive. Has Chinese comment explaining intent, but no ... Add debug logging: except (ValueError, IndexError): logger.debug(f'Invalid @ format: {word}') 60%

Rule: python_except_pass


🟡 MEDIUM - Overly broad exception handling

Agent: python

Category: quality

File: docker/manage.py:396-400

Description: Catches generic Exception when file operations may fail. Should catch specific file-related exceptions for targeted error recovery.

Suggestion: Use specific exceptions: except (FileNotFoundError, PermissionError, subprocess.CalledProcessError) as e:

Confidence: 65%

Rule: py_add_specific_exception_handling


🟡 MEDIUM - Overly broad exception handling in version check (2 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
trendradar/__main__.py:62-64 Catches generic Exception for HTTP requests. Should handle requests-specific exceptions for better... Add specific handling: except requests.RequestException as e: before generic Exception 72%
trendradar/__main__.py:186-187 Catches all exceptions during version check with only generic message. Actual failure context is los... Log with context: `except requests.RequestException as e: logger.warning(f'Version check failed', ex... 65%

Rule: py_add_error_handling_for_external_service_


🟡 MEDIUM - Using datetime.now() instead of timezone-aware datetime

Agent: python

Category: bug

File: trendradar/report/html.py:556

Description: Fallback uses datetime.now() without timezone when get_time_func is not provided. Creates naive datetime that may cause timezone issues.

Suggestion: Use datetime.now(timezone.utc) or require get_time_func to always be provided

Confidence: 70%

Rule: python_datetime_now


🟡 MEDIUM - Hardcoded TF-IDF threshold should be constant (2 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
trendradar/core/frequency.py:219-221 The threshold value 0.105 is hardcoded in the function signature. This magic number should be extrac... Define module-level constant:
DEFAULT_TFIDF_THRESHOLD = 0.105
and use it in function signature with ... 70%
docker/manage.py:96-99 Lines 96-99 define a hardcoded dictionary of weekday names. This configuration data should be extrac... Extract to module level:
WEEKDAY_NAMES = {
"0": "周日", "1": "周一", "2": "周二", "3": "周三",
"4": ... | 65% |

Rule: py_extract_constants_from_hardcoded_values


🟡 MEDIUM - Missing type hints for complex dictionary parameters (4 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
trendradar/report/html.py:14-23 Function 'render_html_content' has parameters with Dict type but lacks specific structure definition... Create TypedDict classes for parameters:
class ReportData(TypedDict):
stats: ...
new_titles:... | 65% |

| docker/manage.py:31-43 | Function 'manual_run' has no return type annotation. Should explicitly annotate '-> None:' for clari... | Add return type: 'def manual_run() -> None:' | 60% | | docker/manage.py:46-124 | Function 'parse_cron_schedule' lacks type hints for parameter and return value. Should have '-> str:... | Add type hints: 'def parse_cron_schedule(cron_expr: str) -> str:' | 70% | | trendradar/__main__.py:47-54 | Nested function 'parse_version' has no type hints for parameter or return value. Should be annotated... | Add type hints: 'def parse_version(version_str: str) -> Tuple[int, int, int]:' and import Tuple from... | 62% |

Rule: python_type_hints_missing


ℹ️ 24 issue(s) outside PR diff (click to expand)

These issues were found in lines not modified in this PR.

🔴 CRITICAL - Hash verification uses SHA1 instead of SHA256

Agent: microservices

Category: security

File: docker/Dockerfile:44

Description: The supercronic binary is verified using SHA1 checksums. While SHA1 is acceptable for non-collision-resistant integrity checks, SHA256 is the modern standard and provides stronger guarantees.

Suggestion: Update to SHA256 verification: obtain SHA256 hash from supercronic releases and use sha256sum instead of sha1sum.

Confidence: 75%

Rule: docker_pin_exact_versions_with_digests_for_base


🔴 CRITICAL - CI/CD pipeline missing test step before production execution

Agent: testing

Category: testing

File: .github/workflows/crawler.yml:50-163

Description: The crawler workflow does not include any test step before running the production crawler. The pipeline proceeds directly from installing dependencies to executing the crawler without validating code quality or correctness.

Suggestion: Add a test step after installing dependencies and before running the crawler. Include pytest or equivalent testing framework to run unit tests with proper assertions and error handling.

Confidence: 90%

Rule: cicd_missing_test_step


🟠 HIGH - Bare except clause catching all exceptions (4 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
docker/manage.py:450-451 Bare except: clause silently swallows all exceptions without any logging or specific handling. Specify exception type: except Exception as e: and optionally log the error or re-raise if appropria... 85%
docker/manage.py:528-529 Bare except clause in stop_webserver function catches all exceptions without specific type. This mak... Replace with: except Exception as e: or catch specific exceptions like (OSError, IOError) relevant t... 85%
docker/manage.py:450-451 Bare except: pass in webserver PID file cleanup catches all exceptions silently. Use except OSError: pass since file removal only raises OSError 70%
docker/manage.py:528-529 Bare except: pass silently ignores errors when removing webserver PID file during stop operation. Use except OSError: pass for specific file removal error handling 70%

Rule: python_bare_except


🟠 HIGH - Bare except block in version parsing (2 occurrences)

Agent: python

Category: bug

📍 View all locations
File Description Suggestion Confidence
docker/manage.py:51-54 Bare except: at line 53 catches all exceptions including SystemExit and KeyboardInterrupt when par... Replace with except (ValueError, IndexError) as e: to catch only expected parsing errors 92%
trendradar/__main__.py:53-54 Bare except: in parse_version function inside check_version catches all exceptions including Syste... Replace with except (ValueError, IndexError) as e: to catch only expected parsing errors 90%

Rule: py_avoid_generic_except_blocks


🟡 MEDIUM - Duplicate rank display logic violates DRY principle

Agent: refactoring

Category: quality

File: trendradar/report/html.py:619-713

Description: The code for handling rank display (min/max rank calculation and rank class determination) appears twice: for regular news items (lines 619-638) and for new news items (lines 696-713).

Suggestion: Extract the rank display logic into a shared helper function: 'def format_rank_info(ranks, rank_threshold)' that returns (rank_class, rank_text).

Confidence: 80%

Rule: quality_guard_clauses


🟡 MEDIUM - Bare except clause in nested function

Agent: python

Category: quality

File: trendradar/__main__.py:47-54

Description: Lines 53-54 in nested function 'parse_version' use bare 'except:' without specifying exception type. This catches all exceptions including SystemExit.

Suggestion: Replace with specific exceptions: 'except (ValueError, IndexError, AttributeError):' or 'except Exception:'

Confidence: 85%

Rule: python_print_debug


🟡 MEDIUM - Large HTML template embedded in Python function

Agent: architecture

Category: architecture

File: trendradar/report/html.py:14-1094

Description: The render_html_content() function embeds ~1000 lines of HTML/CSS/JS template directly in the Python file. This makes maintenance and testing more difficult.

Suggestion: Extract the HTML template to a separate template file (using Jinja2 or similar) or create a dedicated template module.

Confidence: 70%

Rule: py_separate_business_logic_from_framework


🟡 MEDIUM - Repeated stat() calls on same files (2 occurrences)

Agent: performance

Category: performance

📍 View all locations
File Description Suggestion Confidence
docker/manage.py:363-372 Line 364 calls stat() in the sort key, then lines 368-369 call stat() again on the same files for di... Cache stat results: files_with_stats = [(f, f.stat()) for f in files], then use cached stat for both... 60%
trendradar/report/html.py:619-639 Lines 621-622 call min(ranks) and max(ranks) separately, requiring two list traversals. While the im... For small optimization: use 'min_rank, max_rank = min(ranks), max(ranks)' or compute both in single ... 60%

Rule: perf_expensive_in_loop


🟡 MEDIUM - Silent exception swallowing in crontab reading (2 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
docker/manage.py:189-191 Bare except: pass silently swallows all exceptions when reading crontab file content. Errors are h... Use specific exception handling: except (FileNotFoundError, IOError) as e: pass or log the error 85%
trendradar/core/frequency.py:122-123 Exception swallowed when parsing max count directive. Has Chinese comment explaining intent, but no ... Add debug logging: except (ValueError, IndexError): logger.debug(f'Invalid @ format: {word}') 60%

Rule: python_except_pass


🟡 MEDIUM - Overly broad exception handling

Agent: python

Category: quality

File: docker/manage.py:396-400

Description: Catches generic Exception when file operations may fail. Should catch specific file-related exceptions for targeted error recovery.

Suggestion: Use specific exceptions: except (FileNotFoundError, PermissionError, subprocess.CalledProcessError) as e:

Confidence: 65%

Rule: py_add_specific_exception_handling


🟡 MEDIUM - Overly broad exception handling in version check (2 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
trendradar/__main__.py:62-64 Catches generic Exception for HTTP requests. Should handle requests-specific exceptions for better... Add specific handling: except requests.RequestException as e: before generic Exception 72%
trendradar/__main__.py:186-187 Catches all exceptions during version check with only generic message. Actual failure context is los... Log with context: `except requests.RequestException as e: logger.warning(f'Version check failed', ex... 65%

Rule: py_add_error_handling_for_external_service_


🟡 MEDIUM - Using datetime.now() instead of timezone-aware datetime

Agent: python

Category: bug

File: trendradar/report/html.py:556

Description: Fallback uses datetime.now() without timezone when get_time_func is not provided. Creates naive datetime that may cause timezone issues.

Suggestion: Use datetime.now(timezone.utc) or require get_time_func to always be provided

Confidence: 70%

Rule: python_datetime_now


🟡 MEDIUM - Missing type hints for complex dictionary parameters (4 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
trendradar/report/html.py:14-23 Function 'render_html_content' has parameters with Dict type but lacks specific structure definition... Create TypedDict classes for parameters:
class ReportData(TypedDict):
stats: ...
new_titles:... | 65% |

| docker/manage.py:31-43 | Function 'manual_run' has no return type annotation. Should explicitly annotate '-> None:' for clari... | Add return type: 'def manual_run() -> None:' | 60% | | docker/manage.py:46-124 | Function 'parse_cron_schedule' lacks type hints for parameter and return value. Should have '-> str:... | Add type hints: 'def parse_cron_schedule(cron_expr: str) -> str:' | 70% | | trendradar/__main__.py:47-54 | Nested function 'parse_version' has no type hints for parameter or return value. Should be annotated... | Add type hints: 'def parse_version(version_str: str) -> Tuple[int, int, int]:' and import Tuple from... | 62% |

Rule: python_type_hints_missing


🟡 MEDIUM - Hardcoded weekday mapping should be module constant

Agent: python

Category: quality

File: docker/manage.py:96-99

Description: Lines 96-99 define a hardcoded dictionary of weekday names. This configuration data should be extracted to module-level constant for reusability and consistency.

Suggestion: Extract to module level: WEEKDAY_NAMES = { "0": "周日", "1": "周一", "2": "周二", "3": "周三", "4": "周四", "5": "周五", "6": "周六", "7": "周日" }

Confidence: 65%

Rule: py_extract_constants_from_hardcoded_values



Review ID: dbbfa66e-79de-4003-9bc0-64cc472aabdd Rate it 👍 or 👎 to improve future reviews | Powered by diffray

diffray-bot avatar Dec 29 '25 17:12 diffray-bot

不好意思,不懂python,只是fork下来让AI加了一下分栏吸顶效果,错误的提交到主分支去了。

TinTongogo avatar Jan 07 '26 07:01 TinTongogo