Vadim Markovtsev

Results 259 comments of Vadim Markovtsev

Oh and there is another option: SQL interface to the underlying repos via [gitbase](https://github.com/src-d/gitbase)

As far as I know gitbase+Spark integration is not ready yet. But yep this is the goal. So the only way to run PySpark over siva atm is through jgit-spark-connector...

Redirect @ajnavarro

I strongly +1 this as a casual user. Please add this!

@bzz It must be 3TB, not 2.4. Either something went wrong during the download or the index misses some repos. We measured 3TB from our local HDFS copy. This is...

@campoy I have an impression that we are reinventing a huge wheel here, but I cannot list any particular prior. I note that the Torrent protocol can be handy here:...

@bzz I would collect the list of file names with sizes and compare it to the list retrieved from the server (you can ask Rafa to run any listing command...

The number of lines in index matches, the number of siva files should be around 270k. This means 30k were not indexed and it is very, very bad.

So before moving forward, we need to index the siva files which were discarded.

@bzz This is great news! I am so happy you failed to download them two times and this is not an indexing issue!