Gennady Shtech comments

Results 5 comments of


                                            Gennady Shtech

Fit multiple different BatchVectorizers

What I've tried: + set same dictionary for both vectorizers by ``` wiki_batches.dictionary.load_text('./train_data/full.dict.filtered') pubs_batches.dictionary.load_text('./train_data/full.dict.filtered') ``` It doesn't help. As for the problem, I'd like to explain what means "it loses...

Gracefully restarting frontera with SQL backend

Some results of my investigation. I found, that I missed 2 parameters: ``` SQLALCHEMYBACKEND_DROP_ALL_TABLES = False SQLALCHEMYBACKEND_CLEAR_CONTENT = False ``` Now I see strange behavior: if I start worker.db it...

Gracefully restarting frontera with SQL backend

Now I see. When `MessageBus` starts it does `self.spider_feed_partitions = [i for i in range(settings.get('SPIDER_FEED_PARTITIONS'))]` Then in `SpiderFeedStream` ``` self.partitions = messagebus.spider_feed_partitions self.ready_partitions = set(self.partitions) ``` So, worker at start...

Gracefully restarting frontera with SQL backend

No, the problem is to understand: what should I do to be sure if my crawling state is saved between runs. So, at first I found two parameters which prevents...

Can't run with Llama.ccp. Getting "Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)"

@simon-lund thank you! You saved me 2 hours of life! I've spent 1 hour to detect that problem occurs AFTER LLM initialization. But still there is a lot to dig...