implot icon indicating copy to clipboard operation
implot copied to clipboard

Performance

Open VitalyVaryvdin opened this issue 4 years ago • 8 comments

Rendering big dataset of candlestick data drops FPS quite a lot.

50k entries dataset brings my FPS down to 30. Running twice as much decreases it even more. Running GLFW + OGL 3.3. Windows 10, 2080Ti. MSAA & software anti-aliasing is disabled.

Candlestick rendering is taken from implot_demos repository. Update function doesn't do anything else than rendering. How can I improve my performance? Is there anything like instancing? Or maybe switching to different imgui backend might improve my performance?

VitalyVaryvdin avatar Sep 24 '21 18:09 VitalyVaryvdin

I assume you try to render all the candles even if they are not within the viewport.

My first guess would be to try to render only by chunks the candles that are within the viewable area.

I will experiment with this in a few hours and will post my code if i can get it to render hundreds of thousands of candles :)

hesa2020 avatar Sep 25 '21 06:09 hesa2020

Well, I was actually hoping to have good performance when zoomed out on large datasets as well :)

But I'd like to see a snippet how to do proper culling on them as well.

VitalyVaryvdin avatar Sep 25 '21 08:09 VitalyVaryvdin

ok, i am not sure how to limit the zoom, seems like i can get a starting zoom but i cant limit it.

What I would do is to:

  1. Limit X axis based on a fixed number of candles to show at once.
  2. if user wish to view a bigger interval he need to switch candle size lets say from a minute to an hour/month/etc.

However i did BOOST my FPS by rendering only the candles that are in the viewport, and seems like the FitPoint was also needed to be limited because it was dropping my FPS by a ton.

I was dealing with a dataset of 1 million candles which wasn't impacting my performance by much.

I am not 100% confident I used the right variable I am fairly new to this library but it looks like it gets the job done maybe @epezent Can confirm that.

Here is the function I ended up with:

void PlotCandlestick(const char* label_id, const double* xs, const double* opens, const double* closes, const double* lows, const double* highs, int count, bool tooltip, float width_percent, ImVec4 bullCol, ImVec4 bearCol)
{
    // get ImGui window DrawList
    ImDrawList* draw_list = ImPlot::GetPlotDrawList();
    // calc real value width
    double half_width = count > 1 ? (xs[1] - xs[0]) * width_percent : width_percent;

    // custom tool
    if (ImPlot::IsPlotHovered() && tooltip)
    {
        ImPlotPoint mouse = ImPlot::GetPlotMousePos();
        mouse.x = ImPlot::RoundTime(ImPlotTime::FromDouble(mouse.x), ImPlotTimeUnit_Day).ToDouble();
        float  tool_l = ImPlot::PlotToPixels(mouse.x - half_width * 1.5, mouse.y).x;
        float  tool_r = ImPlot::PlotToPixels(mouse.x + half_width * 1.5, mouse.y).x;
        float  tool_t = ImPlot::GetPlotPos().y;
        float  tool_b = tool_t + ImPlot::GetPlotSize().y;
        ImPlot::PushPlotClipRect();
        draw_list->AddRectFilled(ImVec2(tool_l, tool_t), ImVec2(tool_r, tool_b), IM_COL32(128, 128, 128, 64));
        ImPlot::PopPlotClipRect();
        // find mouse location index
        int idx = BinarySearch(xs, 0, count - 1, mouse.x);
        // render tool tip (won't be affected by plot clip rect)
        if (idx != -1)
        {
            ImGui::BeginTooltip();
            char buff[32];
            ImPlot::FormatDate(ImPlotTime::FromDouble(xs[idx]), buff, 32, ImPlotDateFmt_DayMoYr, ImPlot::GetStyle().UseISO8601);
            ImGui::Text("Day:   %s", buff);
            ImGui::Text("Open:  $%.2f", opens[idx]);
            ImGui::Text("Close: $%.2f", closes[idx]);
            ImGui::Text("Low:   $%.2f", lows[idx]);
            ImGui::Text("High:  $%.2f", highs[idx]);
            ImGui::EndTooltip();
        }
    }

    // begin plot item
    if (ImPlot::BeginItem(label_id))
    {
        // override legend icon color
        ImPlot::GetCurrentItem()->Color = IM_COL32(64, 64, 64, 255);

        ImPlotContext& gp = *GImPlot;
        ImPlotPoint plot_start = ImPlot::PixelsToPlot(gp.CurrentPlot->AxesRect.Min.x, 0);
        ImPlotPoint plot_end = ImPlot::PixelsToPlot(gp.CurrentPlot->AxesRect.Max.x, 0);
        // fit data if requested
        if (ImPlot::FitThisFrame())
        {
            for (int i = 0; i < count; ++i)
            {
                if (xs[i] >= plot_start.x && xs[i] <= plot_end.x)
                {
                    ImPlot::FitPoint(ImPlotPoint(xs[i], lows[i]));
                    ImPlot::FitPoint(ImPlotPoint(xs[i], highs[i]));
                }
            }
        }
        // render data
        for (int i = 0; i < count; ++i)
        {
            if (xs[i] >= plot_start.x && xs[i] <= plot_end.x)
            {
                ImVec2 open_pos = ImPlot::PlotToPixels(xs[i] - half_width, opens[i]);
                ImVec2 close_pos = ImPlot::PlotToPixels(xs[i] + half_width, closes[i]);
                ImVec2 low_pos = ImPlot::PlotToPixels(xs[i], lows[i]);
                ImVec2 high_pos = ImPlot::PlotToPixels(xs[i], highs[i]);
                ImU32 color = ImGui::GetColorU32(opens[i] > closes[i] ? bearCol : bullCol);
                draw_list->AddLine(low_pos, high_pos, color);
                draw_list->AddRectFilled(open_pos, close_pos, color);
            }
        }
        // end plot item
        ImPlot::EndItem();
    }
}

NOTE: I rendered 7579 candles at full FPS until ImGui Assert crash with: Too many vertices in ImDrawList using 16-bit indices. Read comment above

Maybe ImPlot should figure a way to split draw lists if more vertices queued than possible to render with imgui.

hesa2020 avatar Sep 26 '21 17:09 hesa2020

Too many vertices in ImDrawList using 16-bit indices. Read comment above

I think this can be dealt with. See imGui imconfig.h:


//---- Use 32-bit vertex indices (default is 16-bit) is one way to allow large meshes with more than 64K vertices.
// Your renderer backend will need to support it (most example renderer backends support both 16/32-bit indices).
// Another way to allow large meshes while keeping 16-bit indices is to handle ImDrawCmd::VtxOffset in your renderer.
// Read about ImGuiBackendFlags_RendererHasVtxOffset for details.
//#define ImDrawIdx unsigned int

hinxx avatar Oct 26 '21 13:10 hinxx

Sorry for hijacking this thread but I'm looking at getting more performance out of my app, too.

I will have multiple data streams coming in over TCP/IP at update rate ~14 Hz max. The individual data traces would be in 10k .. 500k points. I'm drawing with basic PlotLine at the moment. I'm running an ImGui example with calls to ImPlot for the test, default FPS is 60. On my machine two traces of 100k each still retain good FPS at around 60, but one of the CPU cores is at ~100%. I would like to keep FPS high.

Given that my data is likely to be the same over ~4 iterations of render loop (60/14) I was thinking of caching the computation results from the RenderLineStrip() (at a glance this is where draw list is constructed) until a new data comes from the network. This idea comes after looking at perf output that points to PlotLine and glibc memmove as biggest CPU cycle consumers.

Maybe using other 7 cores on the CPU for PlotLine computation would be an avenue to explore to keep the FPS high.

Also, rendering only subset of points that are actually on screen is an option for me, some times, depending on what the user would be looking at.

Any comments on the above are welcomed!

hinxx avatar Oct 26 '21 13:10 hinxx

@hinxx When there are thousands points to render it make sense to use downsampling before plotting. For example using this algorithm https://github.com/sveinn-steinarsson/flot-downsample demo here https://www.base.is/flot/ and c++ port https://gist.github.com/gorbatschow/ce36c15d9265b61d12a1be1783bf0abf

gorbatschow avatar Nov 08 '21 03:11 gorbatschow

That look like a very good approach for me @gorbatschow ! Will test and report ASAP.

hinxx avatar Nov 11 '21 07:11 hinxx

I can shave off half of CPU cycles from a 100 000 points, that would originally be plotted, when using some small threshold (i.e. 1000). It is interesting to see that using threshold of 100 or 10 000 makes results in negligible CPU usage change compared to 1000.

hinxx avatar Nov 11 '21 09:11 hinxx