Skip to content

Add sweep/scale/zScore #290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Add sweep/scale/zScore #290

wants to merge 4 commits into from

Conversation

jmh530
Copy link
Contributor

@jmh530 jmh530 commented Jun 26, 2020

Adds scale and zScore functions. scale is implemented using sweep, and center is re-worked to use sweep). sweep may also have uses on its own.

@9il
Copy link
Member

9il commented Jun 27, 2020

zScore can be renamed to zscore thought like gmean and friends.
I don't think we need an API for scale and sweep because their API is can be easily replaced with lazy binary operations.

@jmh530
Copy link
Contributor Author

jmh530 commented Jun 27, 2020

Renamed and resolved conflicts.

I respectfully disagree on sweep and scale.

  1. The issues you mention for sweep and scale applies just the same as to center, yet center was added without issue.

  2. One motivation for adding scale to mir-algorithm is that lubeck2 has already similar features, see here. It makes sense to consolidate that functionality in one place.

  3. Lazy binary operations cannot easily be used in UFCS chains. x.byDim!1.map!scale for instance, can easily chain additional functions. It is more difficult to do the same thing with lazy binary operators. scale can also be adjusted to use arbitrary functions. One can then easily pass that to some other function, such as a future histogram function.

  4. The motivation for adding sweep is that it is a building block to the other functions. It is functionality that comes from R and is a helper for statistical applications.

One thing that I would be sympathetic with would be to make scale only handle division. That would require combining center and scale to implement the current functionality of scale. This would make center and scale a bit more orthogonal.

@9il
Copy link
Member

9il commented Jun 27, 2020

  • Lazy binary operations cannot easily be used in UFCS chains. x.byDim!1.map!scale for instance, can easily chain additional functions. It is more difficult to do the same thing with lazy binary operators. scale can also be adjusted to use arbitrary functions. One can then easily pass that to some other function, such as a future histogram function.

How x.byDim!1.map!scale can work? Should `scale accept a second one argument there?

@jmh530
Copy link
Contributor Author

jmh530 commented Jun 27, 2020

I may have misinterpreted what you said. Can I get a confirmation of what you are saying here:

I don't think we need an API for scale and sweep because their API is can be easily replaced with lazy binary operations.

I assumed that you were questioning the need for having sweep or scale functions at all.

Are you instead talking the need for the following functions:

@fmamath auto scale(Iterator, size_t N, SliceKind kind, T, U)(
           Slice!(Iterator, N, kind) slice, T m, U d)
@fmamath auto scale(T, U)(T[] array, T m, U d)
@fmamath auto scale(T, U, V)(T withAsSlice, U m, V d)
@fmamath auto sweep(Iterator, size_t N, SliceKind kind, T)(
               Slice!(Iterator, N, kind) slice, T m)
@fmamath auto sweep(T)(T[] array, T m)
@fmamath auto sweep(T, U)(T withAsSlice, U m)

but still keeping

@fmamath auto scale(Iterator, size_t N, SliceKind kind)(
        Slice!(Iterator, N, kind) slice)
@fmamath auto scale(T)(T[] array)
@fmamath auto scale(T)(T withAsSlice)
@fmamath auto sweep(Iterator, size_t N, SliceKind kind)(
        Slice!(Iterator, N, kind) slice)
@fmamath auto sweep(T)(T[] array)
@fmamath auto sweep(T)(T withAsSlice)

What I was referring to in my comment about about the following UT

/// Column scale matrix
version(mir_test)
@safe pure
unittest
{
    import mir.algorithm.iteration: all, equal;
    import mir.math.common: approxEqual;
    import mir.ndslice: fuse;
    import mir.ndslice.topology: alongDim, byDim, map;

    auto x = [
        [20.0, 100.0, 2000.0],
        [10.0,   5.0,    2.0]
    ].fuse;

    auto result = [
        [ 0.707107,  0.707107,  0.707107],
        [-0.707107, -0.707107, -0.707107]
    ].fuse;

    // Use byDim with map to scale by row/column.
    auto xScaleByDim = x.byDim!1.map!scale;
    auto resultByDim = result.byDim!1;
    assert(xScaleByDim.equal!(equal!approxEqual)(resultByDim));

    auto xCenterAlongDim = x.alongDim!0.map!scale;
    auto resultAlongDim = result.alongDim!0;
    assert(xScaleByDim.equal!(equal!approxEqual)(resultAlongDim));
}

It does not make use of the first set of functions, but does make use of the second.

@9il
Copy link
Member

9il commented Jun 27, 2020

What is the difference between zscore and one argument scale ?

@jmh530
Copy link
Contributor Author

jmh530 commented Jun 27, 2020

@9il You mean with just slice? 

zscore only accepts algorithms of VarianceAlgo. scale can accept any functions for centering and dividing. zscore cannot divide by max or medianAbsoluteDeviation or interquartileRange? There can be cases where people will want to do that. 

Some VarianceAlgo do one-pass others do two. scale always does two pass. zscore is better than scale in that case, though one can center and then sweep with divide and VarianceAlgo.assumeZeroMean. That gives the same values but two passes.

This is sort of what I was getting at with respect to the point on orthogonality. scale is equivalent to center and sweep with "/". If scale just does divide and not center then you can chain them on your own. However, the way scale works now is that it chains them for you. For instance, what if you have some floating point values that you rank, then you subtract the median and divide by the highest rank in the initial input. You can use scale for that, but it becomes more difficult to do the same thing chaining center and sweep

I probably should probably remove the  default op on sweep

@9il 9il mentioned this pull request Jul 5, 2020
@jmh530 jmh530 closed this Jul 7, 2020
@jmh530 jmh530 deleted the jmh530-scale branch July 10, 2020 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants