ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds'
@sibiryakov Hi, thanks your suggestion about the kafka. But i have installed it in my pc. I tend to build kafka+hbase crawler.
I have few questions, first when i run this command
python -m frontera.utils.add_seeds --config tutorial.config.dbw --seeds-file seeds.txt
scrapy crawl tutorial -L INFO -s SPIDER_PARTITION_ID=0
i got this error
ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds'

after i removed, i can run the scrapy, but 0 page crawled
SPIDER_MIDDLEWARES = { 'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 999, ̶ ̶ ̶ ̶'̶f̶r̶o̶n̶t̶e̶r̶a̶.̶c̶o̶n̶t̶r̶i̶b̶.̶s̶c̶r̶a̶p̶y̶.̶m̶i̶d̶d̶l̶e̶w̶a̶r̶e̶s̶.̶s̶e̶e̶d̶s̶.̶f̶i̶l̶e̶.̶F̶i̶l̶e̶S̶e̶e̶d̶L̶o̶a̶d̶e̶r̶'̶:̶ ̶1̶,̶ }

besides, my kafka didnt consume any message

All my config is followed the document cluster setup guide.
For the kafka problems.
after i add this line MESSAGE_BUS = 'frontera.contrib.messagebus.kafkabus.MessageBus' and remove ̶ ̶ ̶ ̶'̶f̶r̶o̶n̶t̶e̶r̶a̶.̶c̶o̶n̶t̶r̶i̶b̶.̶s̶c̶r̶a̶p̶y̶.̶m̶i̶d̶d̶l̶e̶w̶a̶r̶e̶s̶.̶s̶e̶e̶d̶s̶.̶f̶i̶l̶e̶.̶F̶i̶l̶e̶S̶e̶e̶d̶L̶o̶a̶d̶e̶r̶'̶:̶ ̶1̶,̶
i got this problem when i start db worker, stategic work and crawler.

my config common.py
from __future__ import absolute_import
from frontera.settings.default_settings import MIDDLEWARES
MAX_NEXT_REQUESTS = 512
SPIDER_FEED_PARTITIONS = 2 # number of spider processes
SPIDER_LOG_PARTITIONS = 2 # worker instances
MIDDLEWARES.extend([
'frontera.contrib.middlewares.domain.DomainMiddleware',
'frontera.contrib.middlewares.fingerprint.DomainFingerprintMiddleware'
])
QUEUE_HOSTNAME_PARTITIONING = True
KAFKA_LOCATION = 'localhost:9092'
URL_FINGERPRINT_FUNCTION='frontera.utils.fingerprint.hostname_local_fingerprint'
MESSAGE_BUS = 'frontera.contrib.messagebus.kafkabus.MessageBus'
SPIDER_LOG_TOPIC = 'frontier-done'
SPIDER_FEED_TOPIC = 'frontier-todo'
SCORING_TOPIC = 'frontier-score'
dbw.py
from __future__ import absolute_import
from .worker import *
LOGGING_CONFIG='logging-db.conf'
spider.py
from __future__ import absolute_import
from .common import *
BACKEND = 'frontera.contrib.backends.remote.messagebus.MessageBusBackend'
KAFKA_GET_TIMEOUT = 0.5
LOCAL_MODE = False # by default Frontera is prepared for single process mode
sw.py
from __future__ import absolute_import
from .worker import *
CRAWLING_STRATEGY = 'frontera.strategy.basic.BasicCrawlingStrategy' # path to the crawling strategy class
LOGGING_CONFIG='logging-sw.conf' # if needed
worker.py
from __future__ import absolute_import
from .common import *
BACKEND = 'frontera.contrib.backends.hbase.HBaseBackend'
HBASE_DROP_ALL_TABLES = True
MAX_NEXT_REQUESTS = 2048
NEW_BATCH_DELAY = 3.0
HBASE_THRIFT_HOST = 'localhost' # HBase Thrift server host and port
HBASE_THRIFT_PORT = 9090
how i create kafka topic
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic frontier-done
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic frontier-todo
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic frontier-score
I set the partition to 2 in common.py,
SPIDER_FEED_PARTITIONS = 2 # number of spider processes
SPIDER_LOG_PARTITIONS = 2 # worker instances
how i start kafka
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic frontier-done --from-beginning
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic frontier-todo --from-beginning
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic frontier-score --from-beginning
Version of tools Name: frontera Version: 0.8.1 Name: Scrapy Version: 1.6.0 Name:Python Version:3.7.3 Name:Kafka Version:2.2.1
I think may be the doc didnt update to v0.8.1, it still stay at v0.8.0.1. Should i downgrade the frontera to the table version v0.8? But myself love to use the latest version instead.
Thanks in advance!
Please, use StackOverflow to ask this type of questions. See also https://stackoverflow.com/help/mcve and https://stackoverflow.com/help/how-to-ask
@Gallaecio I have think before to ask this question in stackoverflow, but there is less responsive than here. I believe that there are bugs involved. Have you read my entire problem?
Also, I checked all the previous questions, @sibiryakov is very responsive to solve the problem, this is why I am asking here.
I will try to ask in stackoverflow...

I have uploaded the question to stackoverflow, https://stackoverflow.com/questions/56493245/modulenotfounderror-no-module-named-frontera-contrib-scrapy-middlewares-seeds
Sorry i have not enough reputation to post image in stackoverflow. but i use i used imgur.com instead. Hope i can get the answer soon..,
@sibiryakov I found a solution for this error
File "/home/liho/anaconda3/lib/python3.7/site-packages/frontera/contrib/messagebus/kafkabus.py", line 60, in __init__
self._partitions = [TopicPartition(self._topic, pid) for pid in self._consumer.partitions_for_topic(self._topic)]
TypeError: 'NoneType' object is not iterable
You should add this line
self._consumer.topics()
before
self._partitions = [TopicPartition(self._topic, pid) for pid in self._consumer.partitions_for_topic(self._topic)]
Seems like partitions_for_topic does not request a metadata refresh, whereas topics does. No clue why this worked in kafka-python 1.4.4, as it seems the two functions have not changed. Maybe metadata was always refreshed asap when creating the consumer in 1.4.4?
Making partitions_for_topic call the same code as topics before returning the partitions seems to solve the problem obviously.
Have a look they are fixing this problem recently https://github.com/dpkp/kafka-python/issues/1789 https://github.com/dpkp/kafka-python/pull/1781 https://github.com/dpkp/kafka-python/issues/1774 https://github.com/Yelp/kafka-utils/pull/216/commits/607a5770b45d7abf41a5351c6575582e78064195
@sibiryakov After i successfully start the cluster
python -m frontera.worker.db --config tutorial.config.dbw --no-incoming --partitions 0 1
python -m frontera.worker.strategy --config tutorial.config.sw --partition-id 0
When i inject the seeds file by command below,
python -m frontera.utils.add_seeds --config tutorial.config.sw --seeds-file seeds.txt
i got this error in the meanwhile in db worker terminal
But after the seed injected, it gone...
scrapy crawl tutorial -L INFO -s SPIDER_PARTITION_ID=1
But i still get 0 page crawled...

Pls help me when you are free sir, thanks in advance!
Hi @liho00 your seeds weren't injected, because the strategy worker was unable to create the table crawler:queue. Check that it can connect to Hbase Thrift Server, and namespace crawler exists.
@sibiryakov Hi, I am sure i have created the namespace crawler before, and i am also sure the queue table was created..., i need to clarify that im using frontera v0.8.1 as the 'frontera.contrib.scrapy.middlewares.seeds' has been removed at this version.

after i tried again the error still show up after key in this command
python -m frontera.utils.add_seeds --config tutorial.config.sw --seeds-file seeds.txt
dbw terminal

But after few second it show the seeds injected?
seeds terminal

I am still getting 0 page crawled
Besides that, can you tell me how to inject the seeds? If this module is not needed,
ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds'
i should inject the seed into my strategic worker?
Lastly, i cannot force close my crawler, it trapped in an endless loop

My kafka, zookeeper, hbase, hadoop all started,

solved by downgrade kafka-python to v1.4.4
If that’s the only fix, then we need to either update setup.py accordingly or add support for later versions of kafka-python.
@Gallaecio it should be a tiny PR https://github.com/scrapinghub/frontera/issues/371#issuecomment-500197551
Besides that, I cannot force close the spiders, it trapped in an endless loop [kafka client] warning unable to send to wakeup socket when using kafka-python v1.4.5 and v1.4.6 (latest).
kafka/client_async.py
except socket.error:
log.warning('Unable to send to wakeup socket!')
https://github.com/dpkp/kafka-python/issues/1837 https://github.com/dpkp/kafka-python/issues/1842

I also get the same problem. How can we solve this?
Getting the same issue here