Mayur Bhosale
Mayur Bhosale
Loss of critical events like StageEnd, JobEnd, ExecutorAdded, etc leads to inaccurate reports. https://github.com/qubole/sparklens/issues/56 highlights this problem. We should write an EventLossDetector Analyzer which will detect the event loss and...
Following metrics related to the Driver are now reported - 1. driverHeapMax => Max Heap memory allocated to the driver JVM 2. driverMaxHeapCommitted => Max Heap memory committed to the...
- Sparklens is compiled against spark 2.0.0 - Spark 2.0.0 has a dependency on json4s 3.2.11 wherein Spark 2.4.0 onwards json4s 3.5.3 is used - In the later version the...
- Added a framework for simplifying adding any metrics to Sparklens report - Mapping the nodes of the sparkplan to the stages they were executed as a part of -...
As a part of this [change](https://github.com/GoogleCloudDataproc/spark-bigquery-connector/pull/146) we have added support for access token-based authorization. But considering the Access token has a short expiry (50 mins from generation), this becomes an...
**Key traits** - Stores the map output data in serialized form - Buffers the data in memory as much as possible. Chunk the data before sending it to RSS servers....
Adds fault tolerance in RSS servers for one or more server going away. This is how the functionality works - Node/server goes away - Task reading/writing data from that server...
Mappers in RSS send shuffle data for any given partition to a single RSS servers, so that reducers can read the shuffle data from a single location. To incorporate this,...
**Describe the bug** FetchNode currently only fetches the static html content from the page and does not fetch any links. Without that multi-level scrapping won't be possible **To Reproduce** ```...
These are follow-up changes from the discussion https://github.com/VinciGit00/Scrapegraph-ai/issues/187 We are now adding a mechanism to fetch the contents of the webpage using beautifulsoup. Apart from the header and body are...