lomas.generator submodule

class lomas.generator.GeneratorCommon(ip_id_dict, cdf_size)
__init__(ip_id_dict, cdf_size)
generate(time_limit, time_unit)
initialize(trace_input)
poisson(lam)
class lomas.generator.GeneratorLomas(ip_id_dict, ordered_ippair, cdf_iat, cdf_size)
__init__(ip_id_dict, ordered_ippair, cdf_iat, cdf_size)

基于历史流量数据进行模型训练、基于训练好的模型产生新的合成流量数据

参数:
  • ip_id_dict (dict) -- key=index of IP, value=(anonymized)IP addr

  • ordered_ippair (list) -- ordered IP pair (IP is represented by its index)

  • cdf_iat (dict) -- key=percentile, value=values of interarrival time CDF at some percentile

  • cdf_size (dict) -- key=percentile, value=values of flow size CDF at some percentile

generate(time_limit, time_unit)

生成新的合成流量数据

参数:
  • time_limit (int) -- control how many flows will be generated (s.t. [num. of flow]*[avg. iat] <= [time_limit])

  • time_unit (int) -- time uint of time_limit

返回:

self.trace_syn (synthetic trace)

返回类型:

pandas.DataFrame

initialize(trace_input)

获取每个源目的对之间的流数据,并以二维数组的数据类型储存

参数:

trace_input (pandas.DataFrame) -- can be accessed using lomas.Preprocessor.trace_input

返回:

self.arr_flow_type, self.dictionary, self.corpus

返回类型:

2D list, 1D list, 2D list

sampling_helper(cdf, tag)

辅助函数,将离散化的流大小、流间隔标签映射回实数值

参数:
  • cdf (pandas.DataFrame) -- pandas.DataFrame(['percentile', 'cdf'])

  • tag (int) -- discretized size or iat tag

返回:

continuous value of size or iat

返回类型:

float

sampling_value(doc_idx)

从隐空间概率分布矩阵中采样,以概率分布产生流大小和流间隔的联合取值

参数:

doc_idx (int) -- index according to ordered IP pair

返回:

interarrival time, flow size

返回类型:

int, int

train(num_topics=25, chunksize=2000, passes=20, iterations=400, eval_every=None)

模型训练

参数:
  • num_topics (int) -- dimension of latent space

  • chunksize (int) -- num of documents will be processed at a time

  • passes (int) -- num of epochs

  • iterations (int) -- how often we repeat a particular loop over each document

  • eval_every (int) -- don't evaluate model perplexity, takes too much time

返回:

self.doc_topics (document-topic distribution), self.topic_terms (topic-word distribution)

返回类型:

2D np.array, 2D np.array