yomichan-import icon indicating copy to clipboard operation
yomichan-import copied to clipboard

Request: Separate Daijirin's J-J and J-E versions

Open anonymouse333 opened this issue 6 years ago • 1 comments

The Daijirin EPWING dictionary comes with both J-J and J-E definitions. Ideally, Yomichan Import should split these into two separate dictionaries so users can choose to add either only the J-J version or only the J-E version to Yomichan.

Alternatively, if the dictionary can't be converted into two separate versions at once, the user should be given the option to strip one version out during the conversion process, leaving them with either only a J-J version or only a J-E version.

anonymouse333 avatar Mar 31 '18 04:03 anonymouse333

Here is a hacky diff to do just that. Reverse the condition to get a dictionary containing only the J->E definitions.

Note this does not remove the English only entries but in my experience those aren't the ones that show up when you don't want them to. As far as I know it doesn't remove any entries incorrectly but the diff between the (pretty-printed) jsons is 400K lines long so I didn't look at the whole thing.

diff --git a/daijirin.go b/daijirin.go
index 5983918..46b11a1 100644
--- a/daijirin.go
+++ b/daijirin.go
@@ -29,6 +29,7 @@ import (
 )
 
 type daijirinExtractor struct {
+	engGlossExp  *regexp.Regexp
 	partsExp     *regexp.Regexp
 	readGroupExp *regexp.Regexp
 	expVarExp    *regexp.Regexp
@@ -39,6 +40,7 @@ type daijirinExtractor struct {
 
 func makeDaijirinExtractor() epwingExtractor {
 	return &daijirinExtractor{
+		engGlossExp:  regexp.MustCompile(`→英和`),
 		partsExp:     regexp.MustCompile(`([^(【〖]+)(?:【(.*)】)?(?:〖(.*)〗)?(?:((.*)))?`),
 		readGroupExp: regexp.MustCompile(`[-・]+`),
 		expVarExp:    regexp.MustCompile(`\(([^\)]*)\)`),
@@ -49,6 +51,10 @@ func makeDaijirinExtractor() epwingExtractor {
 }
 
 func (e *daijirinExtractor) extractTerms(entry zig.BookEntry, sequence int) []dbTerm {
+	if e.engGlossExp.FindStringIndex(entry.Text) != nil {
+		return nil
+	}
+
 	matches := e.partsExp.FindStringSubmatch(entry.Heading)
 	if matches == nil {
 		return nil

rnpnr avatar Jun 25 '21 21:06 rnpnr