feat(core): 添加自定义词典支持和TF-IDF匹配功能
- 集成jieba分词库和sklearn TF-IDF算法
- 实现自定义词典文件config/custom_dict.txt的加载
- 将原有的关键词匹配算法替换为TF-IDF相似度匹配
- 在Dockerfile和配置文件中添加CUSTOM_DICT_FILE环境变量
- 更新Python版本要求从3.10到3.12
- 添加截图模式优化HTML报告生成
- 修复分词和匹配逻辑提升准确率
Changes Summary
This PR integrates jieba Chinese word segmentation and scikit-learn TF-IDF algorithm to replace the previous keyword matching logic with similarity-based matching. It adds support for custom dictionary loading via CUSTOM_DICT_FILE environment variable and includes HTML report generation enhancements with screenshot mode, while upgrading Python requirement from 3.10 to 3.12.
Type: feature
Components Affected: core/frequency.py - keyword matching algorithm, config/custom_dict.txt - new custom dictionary file, requirements.txt - new dependencies (jieba, scikit-learn), pyproject.toml - Python version and dependencies, report/html.py - HTML rendering with screenshot mode, docker/Dockerfile - Python base image and env vars, context.py - integration of new word matching logic
Files Changed
| File | Summary | Change | Impact |
|---|---|---|---|
/tmp/workspace/trendradar/core/frequency.py |
Replaced string-based keyword matching with TF-IDF cosine similarity algorithm; added jieba Chinese word segmentation; added custom dictionary loading support | ✏️ | 🔴 |
/tmp/workspace/requirements.txt |
Added scikit-learn and jieba dependencies for ML-based matching and Chinese NLP | ✏️ | 🔴 |
/tmp/workspace/pyproject.toml |
Upgraded Python requirement from >=3.10 to >=3.12 and added scikit-learn/jieba dependencies | ✏️ | 🔴 |
/tmp/workspace/config/custom_dict.txt |
New custom dictionary file for jieba word segmentation with example entries | ➕ | 🟢 |
/tmp/workspace/trendradar/report/html.py |
Added screenshot mode and enhanced HTML rendering with html2canvas for image generation; added save buttons and content reordering support | ✏️ | 🟡 |
/tmp/workspace/docker/Dockerfile |
Updated Python base image to 3.12-slim and added CUSTOM_DICT_FILE environment variable | ✏️ | 🟡 |
/tmp/workspace/trendradar/context.py |
Updated imports and references to match new frequency word matching function signature | ✏️ | 🟡 |
/tmp/workspace/.github/workflows/crawler.yml |
Minor workflow configuration update | ✏️ | 🟢 |
/tmp/workspace/index.html |
Enhanced HTML index with screenshot/export functionality | ✏️ | 🟢 |
/tmp/workspace/docker/Dockerfile.mcp |
Updated MCP Dockerfile with new environment variables | ✏️ | 🟢 |
Architecture Impact
- New Patterns: ML-based similarity matching (TF-IDF with cosine similarity), Chinese language processing pipeline (jieba segmentation), Custom dictionary injection pattern, Screenshot mode pattern in HTML rendering
- Dependencies: added: scikit-learn>=1.7.2,<2.0.0, added: jieba>=0.42.1,<1.0.0
- Coupling: Frequency matching logic now depends on ML libraries and jieba for Chinese text processing; increased coupling with scikit-learn for TF-IDF computation. Custom dictionary loading creates new dependency on file system for optional dictionary file.
- Breaking Changes: Python version requirement bumped from 3.10+ to 3.12+ (breaking change for older environments), load_frequency_words() function signature changed - now accepts custom_dict_file parameter
Risk Areas: TF-IDF algorithm replaces deterministic string matching - matching behavior may differ significantly from previous implementation; threshold value (0.105) needs validation against real news data, Exception handling in tfidf_match() uses bare except clause (line 259) which catches all exceptions including SystemExit and KeyboardInterrupt, Logic issue in matches_word_groups() at line 214: 'return True' statement executes after first group check, making all subsequent word groups unreachable, Custom dictionary file is optional - behavior differs if file exists vs doesn't exist, but no explicit logging in non-existent case might cause confusion, ML library integration increases memory footprint and dependency count; TfidfVectorizer instantiated per match call (performance concern), Stop words list in tfidf_match() is hardcoded and English-Chinese mixed, not environment-configurable
Suggestions
- Add unit tests validating TF-IDF threshold behavior with representative news titles
- Fix bare except clause at line 259 - use 'except Exception:' instead to avoid catching system exceptions
- Fix the logic error at line 214 where 'return True' breaks the loop early - should only return for matched groups
- Consider making TF-IDF threshold and stop words configurable via environment variables or config file
- Cache TfidfVectorizer instance or pre-compute vectors to improve performance (vectorizer currently created per match)
- Add integration tests comparing matching results before/after algorithm change to identify regressions
- Document the behavior change from deterministic substring matching to probabilistic similarity matching
- Consider adding logging for TF-IDF match results (title, score, threshold) for debugging
Full review in progress... | Powered by diffray
Review Summary
Free public review - Want AI code reviews on your PRs? Check out diffray.ai
Validated 60 issues: 38 kept, 22 filtered
Issues Found: 38
💬 See 14 individual line comment(s) for details.
📊 23 unique issue type(s) across 38 location(s)
📋 Full issue list (click to expand)
🔴 CRITICAL - Unconditional return statement in loop negates logic
Agent: bug in control flow - not from rules but a critical logic error
Category: bug
File: trendradar/core/frequency.py:212-216
Description: Line 214 contains an unconditional 'return True' that executes after the first group's tfidf_match check. This causes the function to always return True on the first iteration of the loop regardless of whether tfidf_match actually succeeded.
Suggestion: Remove the unconditional 'return True' at line 214. The function should only return True if tfidf_match returns True (already handled at line 212-213), then continue looping.
Confidence: 100%
Rule: bug in control flow - not from rules but a critical logic error
🔴 CRITICAL - Hash verification uses SHA1 instead of SHA256
Agent: microservices
Category: security
File: docker/Dockerfile:44
Description: The supercronic binary is verified using SHA1 checksums. While SHA1 is acceptable for non-collision-resistant integrity checks, SHA256 is the modern standard and provides stronger guarantees.
Suggestion: Update to SHA256 verification: obtain SHA256 hash from supercronic releases and use sha256sum instead of sha1sum.
Confidence: 75%
Rule: docker_pin_exact_versions_with_digests_for_base
🔴 CRITICAL - CI/CD pipeline missing test step before production execution
Agent: testing
Category: testing
File: .github/workflows/crawler.yml:50-163
Description: The crawler workflow does not include any test step before running the production crawler. The pipeline proceeds directly from installing dependencies to executing the crawler without validating code quality or correctness.
Suggestion: Add a test step after installing dependencies and before running the crawler. Include pytest or equivalent testing framework to run unit tests with proper assertions and error handling.
Confidence: 90%
Rule: cicd_missing_test_step
🟠 HIGH - Bare except block catches all exceptions (3 occurrences)
Agent: python
Category: bug
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
trendradar/core/frequency.py:259-261 |
Bare except: clause catches SystemExit, KeyboardInterrupt, and other BaseExceptions, making it imp... |
Replace with except Exception as e: to catch only standard exceptions, or use specific types like ... |
95% |
docker/manage.py:51-54 |
Bare except: at line 53 catches all exceptions including SystemExit and KeyboardInterrupt when par... |
Replace with except (ValueError, IndexError) as e: to catch only expected parsing errors |
92% |
trendradar/__main__.py:53-54 |
Bare except: in parse_version function inside check_version catches all exceptions including Syste... |
Replace with except (ValueError, IndexError) as e: to catch only expected parsing errors |
90% |
Rule: py_avoid_generic_except_blocks
🟠 HIGH - TfidfVectorizer instantiation in loop creates O(n*m) complexity (3 occurrences)
Agent: performance
Category: performance
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
trendradar/core/frequency.py:194-248 |
tfidf_match() instantiates a new TfidfVectorizer with stop_words list preprocessing every call. Sinc... | Move TfidfVectorizer instantiation outside the function or use a module-level cached instance. Consi... | 90% |
docker/manage.py:363-372 |
Line 364 calls stat() in the sort key, then lines 368-369 call stat() again on the same files for di... | Cache stat results: files_with_stats = [(f, f.stat()) for f in files], then use cached stat for both... | 60% |
trendradar/report/html.py:619-639 |
Lines 621-622 call min(ranks) and max(ranks) separately, requiring two list traversals. While the im... | For small optimization: use 'min_rank, max_rank = min(ranks), max(ranks)' or compute both in single ... | 60% |
Rule: perf_expensive_in_loop
🟠 HIGH - Missing HEALTHCHECK instruction for exposed service port
Agent: microservices
Category: microservices
File: docker/Dockerfile.mcp:22-23
Description: Dockerfile.mcp exposes port 3333 and runs an HTTP MCP server but lacks a HEALTHCHECK instruction. Container orchestrators cannot detect when the service becomes unresponsive.
Suggestion: Add a HEALTHCHECK instruction before the CMD. Example: HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 CMD python -c "import socket; s = socket.socket(); s.connect(('localhost', 3333)); s.close()" || exit 1
Confidence: 85%
Rule: docker_no_healthcheck
🟠 HIGH - Container image not pinned by digest (2 occurrences)
Agent: security
Category: security
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
docker/Dockerfile:1 |
The base image 'python:3.12-slim' uses a mutable tag which can change unexpectedly between builds. T... | Pin the image to a specific digest: FROM python:3.12-slim@sha256: |
85% |
docker/Dockerfile.mcp:1 |
The base image 'python:3.12-slim' uses a mutable tag which can change unexpectedly between builds, c... | Pin the image to a specific digest: FROM python:3.12-slim@sha256: |
85% |
Rule: gen_pin_container_images_by_digest
🟠 HIGH - Bare except clause catching all exceptions (4 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
docker/manage.py:450-451 |
Bare except: clause silently swallows all exceptions without any logging or specific handling. | Specify exception type: except Exception as e: and optionally log the error or re-raise if appropria... | 85% |
docker/manage.py:528-529 |
Bare except clause in stop_webserver function catches all exceptions without specific type. This mak... | Replace with: except Exception as e: or catch specific exceptions like (OSError, IOError) relevant t... | 85% |
docker/manage.py:450-451 |
Bare except: pass in webserver PID file cleanup catches all exceptions silently. |
Use except OSError: pass since file removal only raises OSError |
70% |
docker/manage.py:528-529 |
Bare except: pass silently ignores errors when removing webserver PID file during stop operation. |
Use except OSError: pass for specific file removal error handling |
70% |
Rule: python_bare_except
🟠 HIGH - Print statements instead of logging module (2 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
trendradar/core/frequency.py:58-60 |
Lines 58 and 60 use print() for status messages in library code. Production code should use the logg... | Import logging module and replace print() calls with logging.info() or logging.debug(). | 75% |
trendradar/__main__.py:47-54 |
Lines 53-54 in nested function 'parse_version' use bare 'except:' without specifying exception type.... | Replace with specific exceptions: 'except (ValueError, IndexError, AttributeError):' or 'except Exce... | 85% |
Rule: python_print_debug
🟠 HIGH - Container runs as root user - No USER directive present
Agent: security
Category: security
File: docker/Dockerfile.mcp:1-23
Description: The Dockerfile does not contain a USER instruction, which means the container will run as root user (UID 0). This increases the attack surface.
Suggestion: Add a USER instruction at the end of the Dockerfile to run as a non-root user. Example: RUN adduser --disabled-password --gecos '' appuser && chown -R appuser:appuser /app, followed by USER appuser
Confidence: 90%
Rule: docker_run_as_root
🟠 HIGH - Environment variables read directly in function instead of being injected
Agent: architecture
Category: architecture
File: trendradar/core/frequency.py:50-64
Description: The load_frequency_words() function reads environment variables (CUSTOM_DICT_FILE and FREQUENCY_WORDS_PATH) directly using os.environ.get() as fallback when parameters are None. This makes testing harder as environment state must be mocked.
Suggestion: Move environment variable loading to application startup and pass the values to load_frequency_words() as parameters. Remove the os.environ fallback from within the function.
Confidence: 75%
Rule: py_move_environment_configuration_to_startu
🟠 HIGH - Large stop words list should be external file
Agent: python
Category: quality
File: trendradar/core/frequency.py:234-248
Description: Lines 236-248 contain a hardcoded list of 49+ Chinese stop words embedded in function code. This static data should be stored in an external configuration file for maintainability and reusability.
Suggestion: Move stop_words to external file (e.g., 'config/stopwords.txt') and load at module initialization. This allows updating stop words without code changes.
Confidence: 85%
Rule: py_move_large_templates_to_external_files
🟡 MEDIUM - Duplicate rank display logic violates DRY principle
Agent: refactoring
Category: quality
File: trendradar/report/html.py:619-713
Description: The code for handling rank display (min/max rank calculation and rank class determination) appears twice: for regular news items (lines 619-638) and for new news items (lines 696-713).
Suggestion: Extract the rank display logic into a shared helper function: 'def format_rank_info(ranks, rank_threshold)' that returns (rank_class, rank_text).
Confidence: 80%
Rule: quality_guard_clauses
🟡 MEDIUM - Large HTML template embedded in Python function
Agent: architecture
Category: architecture
File: trendradar/report/html.py:14-1094
Description: The render_html_content() function embeds ~1000 lines of HTML/CSS/JS template directly in the Python file. This makes maintenance and testing more difficult.
Suggestion: Extract the HTML template to a separate template file (using Jinja2 or similar) or create a dedicated template module.
Confidence: 70%
Rule: py_separate_business_logic_from_framework
🟡 MEDIUM - Dockerfile missing HEALTHCHECK instruction
Agent: bugs
Category: bug
File: docker/Dockerfile:1-72
Description: The Docker container lacks a HEALTHCHECK instruction. Without it, Docker orchestrators cannot monitor the health of the container automatically.
Suggestion: Add a HEALTHCHECK instruction to verify the service is running properly.
Confidence: 70%
Rule: docker_missing_healthcheck
🟡 MEDIUM - Duplicate Dockerfile patterns could be consolidated
Agent: microservices
Category: quality
File: docker/Dockerfile:1-72
Description: Dockerfile and Dockerfile.mcp share similar base images and configuration patterns. Both use python:3.12-slim and install from requirements.txt.
Suggestion: Consider consolidating into a single multi-stage Dockerfile with build targets to reduce maintenance overhead.
Confidence: 65%
Rule: docker_consolidate_duplicate_dockerfiles
🟡 MEDIUM - CMD lacks environment validation wrapper
Agent: microservices
Category: quality
File: docker/Dockerfile.mcp:23
Description: The CMD instruction launches the MCP server directly without validating required configuration files or environment variables exist, unlike the main Dockerfile which uses entrypoint.sh.
Suggestion: Add an entrypoint script similar to the main Dockerfile that validates config files exist before launching the server.
Confidence: 70%
Rule: docker_add_explicit_environment_variable_checks
🟡 MEDIUM - Silent exception swallowing in crontab reading (2 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
docker/manage.py:189-191 |
Bare except: pass silently swallows all exceptions when reading crontab file content. Errors are h... |
Use specific exception handling: except (FileNotFoundError, IOError) as e: pass or log the error |
85% |
trendradar/core/frequency.py:122-123 |
Exception swallowed when parsing max count directive. Has Chinese comment explaining intent, but no ... | Add debug logging: except (ValueError, IndexError): logger.debug(f'Invalid @ format: {word}') |
60% |
Rule: python_except_pass
🟡 MEDIUM - Overly broad exception handling
Agent: python
Category: quality
File: docker/manage.py:396-400
Description: Catches generic Exception when file operations may fail. Should catch specific file-related exceptions for targeted error recovery.
Suggestion: Use specific exceptions: except (FileNotFoundError, PermissionError, subprocess.CalledProcessError) as e:
Confidence: 65%
Rule: py_add_specific_exception_handling
🟡 MEDIUM - Overly broad exception handling in version check (2 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
trendradar/__main__.py:62-64 |
Catches generic Exception for HTTP requests. Should handle requests-specific exceptions for better... |
Add specific handling: except requests.RequestException as e: before generic Exception |
72% |
trendradar/__main__.py:186-187 |
Catches all exceptions during version check with only generic message. Actual failure context is los... | Log with context: `except requests.RequestException as e: logger.warning(f'Version check failed', ex... | 65% |
Rule: py_add_error_handling_for_external_service_
🟡 MEDIUM - Using datetime.now() instead of timezone-aware datetime
Agent: python
Category: bug
File: trendradar/report/html.py:556
Description: Fallback uses datetime.now() without timezone when get_time_func is not provided. Creates naive datetime that may cause timezone issues.
Suggestion: Use datetime.now(timezone.utc) or require get_time_func to always be provided
Confidence: 70%
Rule: python_datetime_now
🟡 MEDIUM - Hardcoded TF-IDF threshold should be constant (2 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
trendradar/core/frequency.py:219-221 |
The threshold value 0.105 is hardcoded in the function signature. This magic number should be extrac... | Define module-level constant: | |
| DEFAULT_TFIDF_THRESHOLD = 0.105 | |||
| and use it in function signature with ... | 70% | ||
docker/manage.py:96-99 |
Lines 96-99 define a hardcoded dictionary of weekday names. This configuration data should be extrac... | Extract to module level: | |
| WEEKDAY_NAMES = { |
"0": "周日", "1": "周一", "2": "周二", "3": "周三",
"4": ... | 65% |
Rule: py_extract_constants_from_hardcoded_values
🟡 MEDIUM - Missing type hints for complex dictionary parameters (4 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
trendradar/report/html.py:14-23 |
Function 'render_html_content' has parameters with Dict type but lacks specific structure definition... | Create TypedDict classes for parameters: | |
| class ReportData(TypedDict): |
stats: ...
new_titles:... | 65% |
| docker/manage.py:31-43 | Function 'manual_run' has no return type annotation. Should explicitly annotate '-> None:' for clari... | Add return type: 'def manual_run() -> None:' | 60% |
| docker/manage.py:46-124 | Function 'parse_cron_schedule' lacks type hints for parameter and return value. Should have '-> str:... | Add type hints: 'def parse_cron_schedule(cron_expr: str) -> str:' | 70% |
| trendradar/__main__.py:47-54 | Nested function 'parse_version' has no type hints for parameter or return value. Should be annotated... | Add type hints: 'def parse_version(version_str: str) -> Tuple[int, int, int]:' and import Tuple from... | 62% |
Rule: python_type_hints_missing
ℹ️ 24 issue(s) outside PR diff (click to expand)
These issues were found in lines not modified in this PR.
🔴 CRITICAL - Hash verification uses SHA1 instead of SHA256
Agent: microservices
Category: security
File: docker/Dockerfile:44
Description: The supercronic binary is verified using SHA1 checksums. While SHA1 is acceptable for non-collision-resistant integrity checks, SHA256 is the modern standard and provides stronger guarantees.
Suggestion: Update to SHA256 verification: obtain SHA256 hash from supercronic releases and use sha256sum instead of sha1sum.
Confidence: 75%
Rule: docker_pin_exact_versions_with_digests_for_base
🔴 CRITICAL - CI/CD pipeline missing test step before production execution
Agent: testing
Category: testing
File: .github/workflows/crawler.yml:50-163
Description: The crawler workflow does not include any test step before running the production crawler. The pipeline proceeds directly from installing dependencies to executing the crawler without validating code quality or correctness.
Suggestion: Add a test step after installing dependencies and before running the crawler. Include pytest or equivalent testing framework to run unit tests with proper assertions and error handling.
Confidence: 90%
Rule: cicd_missing_test_step
🟠 HIGH - Bare except clause catching all exceptions (4 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
docker/manage.py:450-451 |
Bare except: clause silently swallows all exceptions without any logging or specific handling. | Specify exception type: except Exception as e: and optionally log the error or re-raise if appropria... | 85% |
docker/manage.py:528-529 |
Bare except clause in stop_webserver function catches all exceptions without specific type. This mak... | Replace with: except Exception as e: or catch specific exceptions like (OSError, IOError) relevant t... | 85% |
docker/manage.py:450-451 |
Bare except: pass in webserver PID file cleanup catches all exceptions silently. |
Use except OSError: pass since file removal only raises OSError |
70% |
docker/manage.py:528-529 |
Bare except: pass silently ignores errors when removing webserver PID file during stop operation. |
Use except OSError: pass for specific file removal error handling |
70% |
Rule: python_bare_except
🟠 HIGH - Bare except block in version parsing (2 occurrences)
Agent: python
Category: bug
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
docker/manage.py:51-54 |
Bare except: at line 53 catches all exceptions including SystemExit and KeyboardInterrupt when par... |
Replace with except (ValueError, IndexError) as e: to catch only expected parsing errors |
92% |
trendradar/__main__.py:53-54 |
Bare except: in parse_version function inside check_version catches all exceptions including Syste... |
Replace with except (ValueError, IndexError) as e: to catch only expected parsing errors |
90% |
Rule: py_avoid_generic_except_blocks
🟡 MEDIUM - Duplicate rank display logic violates DRY principle
Agent: refactoring
Category: quality
File: trendradar/report/html.py:619-713
Description: The code for handling rank display (min/max rank calculation and rank class determination) appears twice: for regular news items (lines 619-638) and for new news items (lines 696-713).
Suggestion: Extract the rank display logic into a shared helper function: 'def format_rank_info(ranks, rank_threshold)' that returns (rank_class, rank_text).
Confidence: 80%
Rule: quality_guard_clauses
🟡 MEDIUM - Bare except clause in nested function
Agent: python
Category: quality
File: trendradar/__main__.py:47-54
Description: Lines 53-54 in nested function 'parse_version' use bare 'except:' without specifying exception type. This catches all exceptions including SystemExit.
Suggestion: Replace with specific exceptions: 'except (ValueError, IndexError, AttributeError):' or 'except Exception:'
Confidence: 85%
Rule: python_print_debug
🟡 MEDIUM - Large HTML template embedded in Python function
Agent: architecture
Category: architecture
File: trendradar/report/html.py:14-1094
Description: The render_html_content() function embeds ~1000 lines of HTML/CSS/JS template directly in the Python file. This makes maintenance and testing more difficult.
Suggestion: Extract the HTML template to a separate template file (using Jinja2 or similar) or create a dedicated template module.
Confidence: 70%
Rule: py_separate_business_logic_from_framework
🟡 MEDIUM - Repeated stat() calls on same files (2 occurrences)
Agent: performance
Category: performance
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
docker/manage.py:363-372 |
Line 364 calls stat() in the sort key, then lines 368-369 call stat() again on the same files for di... | Cache stat results: files_with_stats = [(f, f.stat()) for f in files], then use cached stat for both... | 60% |
trendradar/report/html.py:619-639 |
Lines 621-622 call min(ranks) and max(ranks) separately, requiring two list traversals. While the im... | For small optimization: use 'min_rank, max_rank = min(ranks), max(ranks)' or compute both in single ... | 60% |
Rule: perf_expensive_in_loop
🟡 MEDIUM - Silent exception swallowing in crontab reading (2 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
docker/manage.py:189-191 |
Bare except: pass silently swallows all exceptions when reading crontab file content. Errors are h... |
Use specific exception handling: except (FileNotFoundError, IOError) as e: pass or log the error |
85% |
trendradar/core/frequency.py:122-123 |
Exception swallowed when parsing max count directive. Has Chinese comment explaining intent, but no ... | Add debug logging: except (ValueError, IndexError): logger.debug(f'Invalid @ format: {word}') |
60% |
Rule: python_except_pass
🟡 MEDIUM - Overly broad exception handling
Agent: python
Category: quality
File: docker/manage.py:396-400
Description: Catches generic Exception when file operations may fail. Should catch specific file-related exceptions for targeted error recovery.
Suggestion: Use specific exceptions: except (FileNotFoundError, PermissionError, subprocess.CalledProcessError) as e:
Confidence: 65%
Rule: py_add_specific_exception_handling
🟡 MEDIUM - Overly broad exception handling in version check (2 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
trendradar/__main__.py:62-64 |
Catches generic Exception for HTTP requests. Should handle requests-specific exceptions for better... |
Add specific handling: except requests.RequestException as e: before generic Exception |
72% |
trendradar/__main__.py:186-187 |
Catches all exceptions during version check with only generic message. Actual failure context is los... | Log with context: `except requests.RequestException as e: logger.warning(f'Version check failed', ex... | 65% |
Rule: py_add_error_handling_for_external_service_
🟡 MEDIUM - Using datetime.now() instead of timezone-aware datetime
Agent: python
Category: bug
File: trendradar/report/html.py:556
Description: Fallback uses datetime.now() without timezone when get_time_func is not provided. Creates naive datetime that may cause timezone issues.
Suggestion: Use datetime.now(timezone.utc) or require get_time_func to always be provided
Confidence: 70%
Rule: python_datetime_now
🟡 MEDIUM - Missing type hints for complex dictionary parameters (4 occurrences)
Agent: python
Category: quality
📍 View all locations
| File | Description | Suggestion | Confidence |
|---|---|---|---|
trendradar/report/html.py:14-23 |
Function 'render_html_content' has parameters with Dict type but lacks specific structure definition... | Create TypedDict classes for parameters: | |
| class ReportData(TypedDict): |
stats: ...
new_titles:... | 65% |
| docker/manage.py:31-43 | Function 'manual_run' has no return type annotation. Should explicitly annotate '-> None:' for clari... | Add return type: 'def manual_run() -> None:' | 60% |
| docker/manage.py:46-124 | Function 'parse_cron_schedule' lacks type hints for parameter and return value. Should have '-> str:... | Add type hints: 'def parse_cron_schedule(cron_expr: str) -> str:' | 70% |
| trendradar/__main__.py:47-54 | Nested function 'parse_version' has no type hints for parameter or return value. Should be annotated... | Add type hints: 'def parse_version(version_str: str) -> Tuple[int, int, int]:' and import Tuple from... | 62% |
Rule: python_type_hints_missing
🟡 MEDIUM - Hardcoded weekday mapping should be module constant
Agent: python
Category: quality
File: docker/manage.py:96-99
Description: Lines 96-99 define a hardcoded dictionary of weekday names. This configuration data should be extracted to module-level constant for reusability and consistency.
Suggestion: Extract to module level: WEEKDAY_NAMES = { "0": "周日", "1": "周一", "2": "周二", "3": "周三", "4": "周四", "5": "周五", "6": "周六", "7": "周日" }
Confidence: 65%
Rule: py_extract_constants_from_hardcoded_values
Review ID: dbbfa66e-79de-4003-9bc0-64cc472aabdd
Rate it 👍 or 👎 to improve future reviews | Powered by diffray
不好意思,不懂python,只是fork下来让AI加了一下分栏吸顶效果,错误的提交到主分支去了。