04-27-2017, 04:07 AM

This program generates random sample of k elements out of total set of N elements. Element can be used only once. All such subsets are drawn with equal probabilitites (assuming RNG is fair).

Algorithm considers each integer i from 0 to N-1 and decides on each whether to include it in a sample or not with probability (k-done)/(N-i) where done is the number of already selected items.

The generated elements are not stored, so k and N can be any positive integers, however this algorithm is only practical with relatively small N.

Instruction:

<N> sto 0

<k> sto 1

<0.random_seed> sto 5

f prgm r/s

Output: first element of a sample 0..N-1

r/s

Output: next element of a sample.

When no more elements outputs -1

r/s after that starts a new sample.

Example:

100 sto 0

10 sto 1

0.1234 sto 5

f prgm r/s

[2] r/s [4] r/s [23] r/s [30] r/s [50] r/s

[53] r/s [57] r/s [59] r/s [66] r/s [96] r/s

[-1] -- indicates end of sample

I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.

Algorithm considers each integer i from 0 to N-1 and decides on each whether to include it in a sample or not with probability (k-done)/(N-i) where done is the number of already selected items.

The generated elements are not stored, so k and N can be any positive integers, however this algorithm is only practical with relatively small N.

Instruction:

<N> sto 0

<k> sto 1

<0.random_seed> sto 5

f prgm r/s

Output: first element of a sample 0..N-1

r/s

Output: next element of a sample.

When no more elements outputs -1

r/s after that starts a new sample.

Code:

01. 0

02. sto 2

03. sto 3

04. rcl 1

05. rcl 3

06. -

07. x=0

08. gto 30

09. rcl 0

10. rcl 2

11. -

12. /

13. rcl 5

14. 1

15. 1

16. *

17. pi

18. +

19. frac

20. sto 5

21. x>=y

22. gto 27

23. 1

24. sto+3

25. rcl 2

26. stop

27. 1

28. sto+2

29. gto 04

30. 1

31. chs

32. gto 00

Example:

100 sto 0

10 sto 1

0.1234 sto 5

f prgm r/s

[2] r/s [4] r/s [23] r/s [30] r/s [50] r/s

[53] r/s [57] r/s [59] r/s [66] r/s [96] r/s

[-1] -- indicates end of sample

I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.